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NUCLEIC ACIDS AND PROTEINS FROM STREPTOCOCCUS GROUPS A & B 

All documents cited herein are incorporated by reference in their entirety. 
TECHNICAL FIELD 

This invention relates to nucleic acid and proteins from the bacteria Streptococcus agalactiae (GBS) and 
5 Streptococcus pyogenes (GAS). 

BACKGROUND ART 

Once thought to infect only cows, the Gram-positive bacterium Streptococcus agalactiae (or "group B 
streptococcus", abbreviated to "GBS") is now known to cause serious disease, bacteremia and 
meningitis, in immunocompromised individuals and in neonates. There are two types of neonatal 

10 infection. The first (early onset, usually within 5 days of birth) is manifested by bacteremia and 
pneumonia. It is contracted vertically as a baby passes through the birth canal. GBS colonises the vagina 
of about 25% of young women, and approximately 1% of infants born via a vaginal birth to colonised 
mothers will become infected. Mortality is between 50-70%. The second is a meningitis that occurs 10 to 
60 days after birth. If pregnant women are vaccinated with type III capsule so that the infants are 

15 passively immunised, the incidence of the late onset meningitis is reduced but is not entirely eliminated. 

The "B" in "GBS" refers to the Lancefield classification, which is based on the antigenicity of a 
carbohydrate which is soluble in dilute acid and called the C carbohydrate. Lancefield identified 13 types 
of C carbohydrate, designated A to O, that could be serologically differentiated. The organisms that 
most commonly infect humans are found in groups A, B, D, and G. Within group B, strains can be 
20 divided into 8 serotypes (la, lb, Ia/c, II, in, IV, V, and VI) based on the structure of their 
polysaccharide capsule. 

Group A streptococcus ("GAS", S.pyogenes) is a frequent human pathogen, estimated to be present in 
between 5-15% of normal individuals without signs of disease. When host defences are compromised, 
or when the organism is able to exert its virulence, or when it is introduced to vulnerable tissues or hosts, 
25 however, an acute infection occurs. Diseases include puerperal fever, scarlet fever, erysipelas, 
pharyngitis, impetigo, necrotising fasciitis, myositis and streptococcal toxic shock syndrome. 

S.pyogenes is typically treated using antibiotics. Although S.agalactiae is inhibited by antibiotics, 
however, it is not killed by penicillin as easily as GAS. Prophylactic vaccination is thus preferable. 

Current GBS vaccines are based on polysaccharide antigens, although these suffer from poor 
30 immunogenicity. Anti-idiotypic approaches have also been used (e.g. W099/54457). There remains a 
need, however, for effective adult vaccines against S.agalactiae infection. There also remains a need for 
vaccines against S.pyogenes infection. 

It is an object of the invention to provide proteins which can be used in the development of such 
vaccines. The proteins may also be useful for diagnostic purposes, and as targets for antibiotics. 
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DISCLOSURE OF THE INVENTION 

The invention provides proteins comprising the S.agalactiae amino acid sequences disclosed in the 
examples, and proteins comprising the S.pyogenes amino acid sequences disclosed in the examples. 
These amino acid sequences are the even SEQ IDs between 1 and 10960. 

5 It also provides proteins comprising amino acid sequences having sequence identity to the S.agalactiae 
amino acid sequences disclosed in the examples, and proteins comprising amino acid sequences having 
sequence identity to the S.pyogenes amino acid sequences disclosed in the examples. Depending on the 
particular sequence, the degree of sequence identity is preferably greater than 50% (e.g. 60%, 70%, 
80%, 90%, 95%, 99% or more). These proteins include homologs, orthologs, allelic variants and 
10 functional mutants. Typically, 50% identity or more between two proteins is considered to be an 
indication of functional equivalence. Identity between proteins is preferably determined by the 
Smith-Waterman homology search algorithm as implemented in the MPSRCH program (Oxford 
Molecular), using an affine gap search with parameters gap open penalty=12 and gap extension 
penalty=l. 

1 5 Preferred proteins of the invention are GBS 1 to GBS689 (see Table IV). 

The invention further provides proteins comprising fragments of the S.agalactiae amino acid sequences 
disclosed in the examples, and proteins comprising fragments of the S.pyogenes amino acid sequences 
disclosed in the examples. The fragments should comprise at least n consecutive amino acids from the 
sequences and, depending on the particular sequence, n is 7 or more (e.g. 8, 10, 12, 14, 16, 18, 20, 30, 
20 40, 50, 60, 70, 80, 90, 100 or more). Preferably the fragments comprise one or more epitopes from the 
sequence. Other preferred fragments are (a) the N-terminal signal peptides of the proteins disclosed in 
the examples, (b) the proteins disclosed in the examples, but without their N-terminal signal peptides, (c) 
fragments common to the related GAS and GBS proteins disclosed in the examples, and (d) the proteins 
disclosed in the examples, but without their N-terminal amino acid residue. 

25 The proteins of the invention can, of course, be prepared by various means (e.g. recombinant 
expression, purification from GAS or GBS, chemical synthesis etc.) and in various forms (e.g. native, 
fusions, glycosylated, non-glycosylated etc.). They are preferably prepared in substantially pure form 
(i.e. substantially free from other streptococcal or host cell proteins) or substantially isolated form. 
Proteins of the invention are preferably streptococcal proteins. 

30 According to a further aspect, the invention provides antibodies which bind to these proteins. These 
may be polyclonal or monoclonal and may be produced by any suitable means ^.g. by recombinant 
expression). To increase compatibility with the human immune system, the antibodies may be chimeric 
or humanised (e.g. Breedveld (2000) Lancet 355(9205):735-740; Gorman & Clark (1990) Semin. 
Immunol. 2:457-466), or fully human antibodies may be used. The antibodies may include a detectable 

35 label (e.g. for diagnostic assays). 
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According to a further aspect, the invention provides nucleic acid comprising the S.agalactiae 
nucleotide sequences disclosed in the examples, and nucleic acid comprising the S.pyogenes nucleotide 
sequences disclosed in the examples. These nucleic acid sequences are the odd SEQ IDs between 1 and 
10966. 

5 In addition, the invention provides nucleic acid comprising nucleotide sequences having sequence 
identity to the S.agalactiae nucleotide sequences disclosed in the examples, and nucleic acid comprising 
nucleotide sequences having sequence identity to the S.pyogenes nucleotide sequences disclosed in the 
examples. Identity between sequences is preferably determined by the Smith-Waterman homology 
search algorithm as described above. 

10 Furthermore, the invention provides nucleic acid which can hybridise to the S.agalactiae nucleic acid 
disclosed in the examples, and nucleic acid which can hybridise to the S.pyogenes nucleic acid disclosed 
in the examples preferably under 'high stringency' conditions {e.g. 65°C in O.lxSSC, 0.5% SDS 
solution). 

Nucleic acid comprising fragments of these sequences are also provided. These should comprise at least 
15 n consecutive nucleotides from the S.agalactiae or S.pyogenes sequences and, depending on the 
particular sequence, n is 10 or more (e.g. 12, 14, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 
200 or more). The fragments may comprise sequences which are common to the related GAS and GBS 
sequences disclosed in the examples. 

According to a further aspect, the invention provides nucleic acid encoding the proteins and protein 
20 fragments of the invention. 

The invention also provides: nucleic acid comprising nucleotide sequence SEQ ID 10967; nucleic acid 
comprising nucleotide sequences having sequence identity to SEQ ID 10967; nucleic acid which can 
hybridise to SEQ ID 10967 (preferably under 'high stringency' conditions); nucleic acid comprising a 
fragment of at least n consecutive nucleotides from SEQ ID 10967, wherein n is 10 or more e.g. 12, 14, 
25 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 
900, 1000, 1500, 2000, 3000, 4000, 5000, 10000, 100000, 1000000 or more 

Nucleic acids of the invention can be used in hybridisation reactions (e.g. Northern or Southern blots, or 
in nucleic acid microarrays or 'gene chips') and amplification reactions (e.g. PCR, SDA, SSSR, LCR, 
TMA, NASBA etc.) and other nucleic acid techniques. 

30 It should also be appreciated that the invention provides nucleic acid comprising sequences 
complementary to those described above (e.g. for antisense or probing, or for use as primers). 

Nucleic acid according to the invention can, of course, be prepared in many ways (e.g. by chemical 
synthesis, from genomic or cDNA libraries, from the organism itself etc.) and can take various forms 
(e.g. single stranded, double stranded, vectors, primers, probes, labelled etc.). The nucleic acid is 
35 preferably in substantially isolated form. 
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Nucleic acid according to the invention may be labelled e.g. with a radioactive or fluorescent label. This 
is particularly useful where the nucleic acid is to be used in nucleic acid detection techniques e.g. where 
the nucleic acid is a primer or as a probe for use in techniques such as PCR, LCR, TMA, NASBA etc. 

In addition, the term "nucleic acid" includes DNA and RNA, and also their analogues, such as those 
5 containing modified backbones, and also peptide nucleic acids (PNA) etc. 

According to a further aspect, the invention provides vectors comprising nucleotide sequences of the 
invention (e.g. cloning or expression vectors) and host cells transformed with such vectors. 

According to a further aspect, the invention provides compositions comprising protein, antibody, and/or 
nucleic acid according to the invention. These compositions may be suitable as immunogenic 
10 compositions, for instance, or as diagnostic reagents, or as vaccines. 

The invention also provides nucleic acid, protein, or antibody according to the invention for use as 
medicaments (s.g. as immunogenic compositions or as vaccines) or as diagnostic reagents. It also 
provides the use of nucleic acid, protein, or antibody according to the invention in the manufacture of: (i) 
a medicament for treating or preventing disease and/or infection caused by streptococcus; (ii) a 
15 diagnostic reagent for detecting the presence of streptococcus or of antibodies raised against 
streptococcus; and/or (iii) a reagent which can raise antibodies against streptococcus. Said 
streptococcus may be any species, group or strain, but is preferably S.agalactiae, especially serotype 
III or V, or S.pyogenes. Said disease may be bacteremia, meningitis, puerperal fever, scarlet fever, 
erysipelas, pharyngitis, impetigo, necrotising fasciitis, myositis or toxic shock syndrome. 

20 The invention also provides a method of treating a patient, comprising administering to the patient a 
therapeutically effective amount of nucleic acid, protein, and/or antibody of the invention. The patient 
may either be at risk from the disease themselves or may be a pregnant woman ('maternal immunisation' 
e.g. Glezen & Alpers (1999) Clin. Infect. Dis. 28:219-224). 

Administration of protein antigens is a preferred method of treatment for inducing immunity. 

25 Administration of antibodies of the invention is another preferred method of treatment. This method of 
passive immunisation is particularly useful for newborn children or for pregnant women. This method 
will typically use monoclonal antibodies, which will be humanised or fully human. 

The invention also provides a kit comprising primers (e.g. PCR primers) for amplifying a template 
sequence contained within a Streptococcus (e.g. S.pyogenes or S.agalactiae) nucleic acid sequence, the 
30 kit comprising a first primer and a second primer, wherein the first primer is substantially complementary 
to said template sequence and the second primer is substantially complementary to a complement of said 
template sequence, wherein the parts of said primers which have substantial complementarity define the 
termini of the template sequence to be amplified. The first primer and/or the second primer may include 
a detectable label (e.g. a fluorescent label). 
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The invention also provides a kit comprising first and second single-stranded oligonucleotides which 
allow amplification of a Streptococcus template nucleic acid sequence contained in a single- or double- 
stranded nucleic acid (or mixture thereof), wherein: (a) the first oligonucleotide comprises a primer 
sequence which is substantially complementary to said template nucleic acid sequence; (b) the second 
5 oligonucleotide comprises a primer sequence which is substantially complementary to the complement 
of said template nucleic acid sequence; (c) the first oligonucleotide and/or the second oligonucleotide 
comprise(s) sequence which is not compementary to said template nucleic acid; and (d) said primer 
sequences define the termini of the template sequence to be amplified. The non-complementary 
sequence(s) of feature (c) are preferably upstream of (i.e. 5' to) the primer sequences. One or both of 
10 these (c) sequences may comprise a restriction site (e.g. EP-B-0509612) or a promoter sequence (e.g. 
EP-B-0505012). The first oligonucleotide and/or the second oligonucleotide may include a detectable 
label (e.g. a fluorescent label). 

The template sequence may be any part of a genome sequence (e.g. SEQ ID 10967). For example, it 
could be a rRNA gene (e.g. Turenne et al. (2000) J. Clin. Microbiol. 38:513-520; SEQ IDs 12018-12024 
15 herein) or a protein-coding gene. The template sequence is preferably specific to GBS. 

The invention also provides a computer-readable medium (e.g. a floppy disk, a hard disk, a CD-ROM, a 
DVD etc.) and/or a computer database containing one or more of the sequences in the sequence listing. 
The medium preferably contains SEQ ID 10967. 

The invention also provides a hybrid protein represented by the formula NH 2 -A-[-X-L-]„-B-COOH, 

20 wherein X is a protein of the invention, L is an optional linker amino acid sequence, A is an optional 
N-terminal amino acid sequence, B is an optional C-terminal amino acid sequence, and n is an integer 
greater than 1. The value of n is between 2 and x, and the value of x is typically 3, 4, 5, 6, 7, 8, 9 or 10. 
Preferably n is 2, 3 or 4; it is more preferably 2 or 3; most preferably, n = 2. For each n instances, -X- 
may be the same or different. For each n instances of [-X-L-], linker amino acid sequence -L- may be 

25 present or absent. For instance, when 7?=2 the hybrid may be NH 2 -X 1 -L 1 -X2-L 2 -COOH, NH 2 -X r X 2 - 
COOH, NHrXpLj-XrCOOH, NH 2 -X r X 2 -L 2 -COOH, etc. Linker amino acid sequence^) -L- will 
typically be short (e.g. 20 or fewer amino acids i.e. 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 
3, 2, 1). Examples include short peptide sequences which facilitate cloning, poly-glycine linkers (i.e. Gly„ 
where n = 2, 3, 4, 5, 6, 7, 8, 9, 10 or more), and histidine tags (i.e. His,, where n = 3, 4, 5, 6, 7, 8, 9, 10 

30 or more). Other suitable linker amino acid sequences will be apparent to those skilled in the art. -A- and - 
B- are optional sequences which will typically be short (e.g. 40 or fewer amino acids i.e. 39, 38, 37, 36, 
35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 
7, 6, 5, 4, 3, 2, 1). Examples include leader sequences to direct protein trafficking, or short peptide 
sequences which facilitate cloning or purification (e.g. histidine tags i.e. His„ where n = 3, 4, 5, 6, 7, 8, 9, 

35 10 or more). Other suitable N-terminal and C-terminal amino acid sequences will be apparent to those 
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skilled in the art. In some embodiments, each X will be a GBS sequence; in others, mixtures of GAS and 
GBS will be used. 

According to farther aspects, the invention provides various processes. 

A process for producing proteins of the invention is provided, comprising the step of culturing a host 
5 cell of to the invention under conditions which induce protein expression. 

A process for producing protein or nucleic acid of the invention is provided, wherein the protein or 
nucleic acid is synthesised in part or in whole using chemical means. 

A process for detecting polynucleotides of the invention is provided, comprising the steps of: (a) 
contacting a nucleic probe according to the invention with a biological sample under hybridising 
10 conditions to form duplexes; and (b) detecting said duplexes. 

A process for detecting Streptococcus in a biological sample (e.g. blood) is also provided, comprising 
the step of contacting nucleic acid according to the invention with the biological sample under 
hybridising conditions. The process may involve nucleic acid amplification <$.g. PCR, SDA, SSSR, 
LCR, TMA, NASBA etc.) or hybridisation (e.g. microarrays, blots, hybridisation with a probe in 
15 solution etc.). PCR detection of Streptococcus in clinical samples, in particular S.pyogenes, has been 
reported [see e.g. Louie et al. (2000) CMAJ 163:301-309; Louie et al. (1998) J. Clin. Microbiol. 
36:1769-1771], Clinical assays based on nucleic acid are described in general in Tang et al. (1997) Clin. 
Chem. 43:2021-2038. 

A process for detecting proteins of the invention is provided, comprising the steps of: (a) contacting an 
20 antibody of the invention with a biological sample under conditions suitable for the formation of an 
antibody-antigen complexes; and (b) detecting said complexes. 

A process for identifying an amino acid sequence is provided, comprising the step of searching for 
putative open reading frames or protein-coding regions within a genome sequence of S.agalactiae. This 
will typically involve in silico searching the sequence for an initiation codon and for an in-frame 

25 termination codon in the downstream sequence. The region between these initiation and termination 
codons is a putative protein-coding sequence. Typically, all six possible reading frames will be searched. 
Suitable software for such analysis includes ORFFINDER (NCBI), GENEMARK [Borodovsky & 
Mclninch (1993) Computers Chem. 17:122-133), GLIMMER [Salzberg et al. (1998) Nucleic Acids Res. 
26:544-548; Salzberg et al. (1999) Genomics 59:24-31; Delcher et al. (1999) Nucleic Acids Res. 27:4636- 

30 4641], or other software which uses Markov models [e.g. Shmatkov et al. (1999) Bioinformatics 
15:874-876]. The invention also provides a protein comprising the identified amino acid sequence. These 
proteins can then expressed using conventional techniques. 

The invention also provides a process for determining whether a test compound binds to a protein of the 
invention. If a test compound binds to a protein of the invention and this bmding inhibits the life cycle of 
35 the GBS bacterium, then the test compound can be used as an antibiotic or as a lead compound for the 
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design of antibiotics. The process will typically comprise the steps of contacting a test compound with a 
protein of the invention, and determining whether the test compound binds to said protein. Preferred 
proteins of the invention for use in these processes are enzymes (e.g. tRNA synthetases), membrane 
transporters and ribosomal proteins. Suitable test compounds include proteins, polypeptides, 
5 carbohydrates, lipids, nucleic acids (e.g. DNA, RNA, and modified forms thereof), as well as small 
organic compounds (e.g. MW between 200 and 2000 Da). The test compounds may be provided 
individually, but will typically be part of a library {e.g. a combinatorial library). Methods for detecting a 
binding interaction include NMR, filter-binding assays, gel-retardation assays, displacement assays, 
surface plasmon resonance, reverse two-hybrid etc. A compound which binds to a protein of the 
10 invention can be tested for antibiotic activity by contacting the compound with GBS bacteria and then 
monitoring for inhibition of growth. The invention also provides a compound identified using these 
methods. 

The invention also provides a composition comprising a protein or the invention and one or more of the 
following antigens: 

15 - a protein antigen from Helicobacter pylori such as VacA, CagA, NAP, HopX, HopY [e.g. 
WO98/04702] and/or urease. 

- a protein antigen from N.meningitidis serogroup B, such as those in W099/24578, W099/36544, 
WO99/57280, WO00/22430, Tettelin et al. (2000) Science 287:1809-1815, Pizza et al. (2000) 
Science 287:1816-1820 and W096/29412, with protein '287' and derivatives being particularly 

20 preferred. 

- an outer-membrane vesicle (OMV) preparation from N.meningitidis serogroup B, such as those 
disclosed in WO01/52885; Bjune et al. (1991) Lancet 338(8775): 1093-1096; Fukasawa et al. (1999) 
Vaccine 17:2951-2958; Rosenqvist et al. (1998) Dev. Biol. Stand. 92:323-333 etc. 

- a saccharide antigen from N.meningitidis serogroup A, C, W135 and/or Y, such as the 
25 oligosaccharide disclosed in Costantino et al. (1992) Vaccine 10:691-698from serogroup C [see 

also Costantino et al. (1999) Vaccine 17:1251-1263]. 

- a saccharide antigen from Streptococcus pneumoniae {e.g. Watson (2000) Pediatr Infect Dis J 
19:331-332; Rubin (2000) Pediatr Clin North Am 47:269-285, v; Jedrzejas (2001) Microbiol Mol 
Biol Rev 65:187-207]. 

30 - an antigen from hepatitis A vims, such as inactivated virus [e.g. Bell (2000) Pediatr Infect Dis J 
19:1187-1188; Iwarson (1995) APMIS 103:321-326]. 

- an antigen from hepatitis B virus, such as the surface and/or core antigens [e.g. Gerlich et al. (1990) 
Vaccine 8 Suppl:S63-68 & 79-80]. 

- an antigen from hepatitis C virus [e.g. Hsu et al. (1999) Clin Liver Dis 3:901-915]. 

35 - an antigen from Bordetella pertussis, such as pertussis holotoxin (PT) and filamentous 
haemagglutinin (FHA) from B.pertussis, optionally also in combination with pertactin and/or 
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agglutinogens 2 and 3 [e.g. Gustafsson et al. (1996) N. Engl. J. Med. 334:349-355; Rappuoli et al. 
(1991) TIBTECH 9:232-238]. 

- a diphtheria antigen, such as a diphtheria toxoid [e.g. chapter 3 of Vaccines (1988) eds. Plotkin & 
Mortimer. ISBN 0-7216-1946-0] e.g. the CRMi 97 mutant [e.g. Del Guidice et al. (1998) Molecular 

5 Aspects of Medicine 19: 1-70]. 

- a tetanus antigen, such as a tetanus toxoid [e.g. chapter 4 of Plotkin & Mortimer]. 

- a saccharide antigen from Haemophilus influenzae B. 

- an antigen from N.gonorrhoeae [e.g. W099/24578, W099/36544, WO99/57280]. 

- an antigen from Chlamydia pneumoniae [e.g. PCT/EB01/01445; Kalman et al. (1999) Nature 
10 Genetics 21:385-389; Read et al. (2000) Nucleic Acids Res 28:1397-406; Shirai et al. (2000) J. 

Infect. Bis. 181(Suppl 3):S524-S527; WO99/27105; WO00/27994; WO00/37494]. 

- an antigen from Chlamydia trachomatis [e.g. W099/28475]. 

- an antigen from Porphyromonas gingivalis [e.g. Ross et al. (2001) Vaccine 19:4135-4142]. 

- polio antigen(s) [e.g. Sutter et al. (2000) Pediatr Clin North Am 47:287-308; Zimmerman & Spann 
15 (1999)^m Fam Physician 59:113-118, 125-126] such as IPV or OPV. 

- rabies antigen(s) [e.g. Dreesen (1997) Vaccine 15 Suppl:S2-6] such as lyophilised inactivated virus 
[e.g. MMWR Morb Mortal WklyRep 1998 Jan 16;47(1):12, 19; RabAvert™]. 

- measles, mumps and/or rubella antigens [e.g. chapters 9, 10 & 1 1 of Plotkin & Mortimer]. 

- influenza antigen(s) [e.g. chapter 19 of Plotkin & Mortimer], such as the haemagglutinin and/or 
20 neuraminidase surface proteins. 

- an antigen from Moraxella catarrhalis [e.g. McMichael (2000) Vaccine 19 Suppl 1 :S101-107]. 

- an antigen from Staphylococcus aureus [e.g. Kuroda et al. (2001) Lancet 357(9264): 1225-1240; 
see also pages 1218-1219]. 

Where a saccharide or carbohydrate antigen is included, it is preferably conjugated to a carrier protein in 
25 orderto enhance immunogenicity [e.g. Ramsay et al. (2001) Lance? 357(9251): 195- 196; Lindberg (1999) 

Vaccine 17 Suppl 2:S28-36; Conjugate Vaccines (eds. Cruse et al.) ISBN 3805549326, particularly vol. 

10:48-114 etc.]. Preferred carrier proteins are bacterial toxins or toxoids, such as diphtheria or tetanus 

toxoids. The CRM 197 diphtheria toxoid is particularly preferred. Other suitable carrier proteins include 

the N meningitidis outer membrane protein [e.g. EP-0372501], synthetic peptides [e.g. EP-0378881, EP- 
30 0427347], heat shock proteins [e.g. W093/17712], pertussis proteins [e.g. W098/58668; EP-0471177], 

protein D from {{.influenzae [e.g. WO00/56360], toxin A or B from C.difficile [e.g. WO00/61761], etc. 

Any suitable conjugation reaction can be used, with any suitable linker where necessary. 

Toxic protein antigens may be detoxified where necessary (e.g. detoxification of pertussis toxin by 
chemical and/or genetic means). 



WO 02/34771 



-9- 



PCT/GB01/04789 



Where a diphtheria antigen is included in the composition it is preferred also to include tetanus antigen 
and pertussis antigens. Similarly, where a tetanus antigen is included it is preferred also to include 
diphtheria and pertussis antigens. Similarly, where a pertussis antigen is included it is preferred also to 
include diphtheria and tetanus antigens. 

5 Antigens are preferably adsorbed to an aluminium salt. 

Antigens in the composition will typically be present at a concentration of at least lug/ml each. In 
general, the concentration of any given antigen will be sufficient to elicit an immune response against that 
antigen. 

The invention also provides compositions comprising two or more proteins of the present invention. 
10 The two or more proteins may comprise GBS sequences or may comprise GAS and GBS sequences. 

A summary of standard techniques and procedures which may be employed to perform the invention 
{e.g. to utilise the disclosed sequences for vaccination or diagnostic purposes) follows. This summary is 
not a limitation on the invention but, rather, gives examples that may be used, but are not required. 
General 

15 The practice of the present invention will employ, unless otherwise indicated, conventional techniques of 
molecular biology, microbiology, recombinant DNA, and immunology, which are within the skill of the 
art. Such techniques are explained fully in the literature eg. Sambrook Molecular Cloning; A Laboratory 
Manual, Second Edition (1989); DNA Cloning, Volumes I and II (D.N Glover ed. 1985); 
Oligonucleotide Synthesis (M.J. Gait ed, 1984); Nucleic Acid Hybridization (B.D. Hames & SJ. 

20 Higgins eds. 1984); Transcription and Translation (B.D. Hames & S.J. Higgins eds. 1984); Animal 
Cell Culture (R.I. Freshney ed. 1986); Immobilized Cells and Enzymes (IRL Press, 1986); B. Perbal, A 
Practical Guide to Molecular Cloning (1984); the Methods in Enzymology series (Academic Press, 
Inc.), especially volumes 154 & 155; Gene Transfer Vectors for Mammalian Cells (J.H. Miller and M.P. 
Calos eds. 1987, Cold Spring Harbor Laboratory); Mayer and Walker, eds. (1987), Immunochemical 

25 Methods in Cell and Molecular Biology (Academic Press, London); Scopes, (1987) Protein 
Purification: Principles and Practice, Second Edition (Springer- Verlag, N.Y.), and Handbook of 
Experimental Immunology, Volumes I-IV(DM. Weir and C. C. Blackwell eds 1986). 
Standard abbreviations for nucleotides and amino acids are used in this specification. 
Definitions 

30 A composition containing X is "substantially free of Y when at least 85% by weight of the total X+Y in the composition is X. 
Preferably, X comprises at least about 90% by weight of the total of X+Y in the composition, more preferably at least about 95% 
or even 99% by weight. 

The term "comprising" means "including" as well as "consisting" e.g. a composition "comprising" X may consist exclusively of 
X or may include something additional e.g. X + Y. 
35 The term "heterologous" refers to two biological components that are not found together in nature. The components may be host 
cells, genes, or regulatory regions, such as promoters. Although the heterologous components are not found together in nature, 
they can function together, as when a promoter heterologous to a gene is operably linked to the gene. Another example is where a 
streptococcus sequence is heterologous to a mouse host cell. A further examples would be two epitopes from the same or 
different proteins which have been assembled in a single protein in an arrangement not found in nature 
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An "origin of replication" is a polynucleotide sequence that initiates and regulates replication of polynucleotides, such as an 
expression vector. The origin of replication behaves as an autonomous unit of polynucleotide replication within a cell, capable of 
replication under its own control. An origin of replication may be needed for a vector to replicate in a particular host cell. With 
certain origins of replication, an expression vector can be reproduced at a high copy number in the presence of the appropriate 
5 proteins within the cell. Examples of origins are the autonomously replicating sequences, which are effective in yeast; and the viral 
T-antigen, effective in COS-7 cells. 

A "mutanf sequence is defined as DNA, RNA or amino acid sequence differing from but having sequence identity with the 
native or disclosed sequence. Depending on the particular sequence, the degree of sequence identity between the native or 
disclosed sequence and the mutant sequence is preferably greater than 50% (eg. 60%, 70%, 80%, 90%, 95%, 99% or more, 

10 calculated using the Smith-Waterman algorithm as described above). As used herein, an "allelic variant" of a nucleic acid 
molecule, or region, for which nucleic acid sequence is provided herein is a nucleic acid molecule, or region, that occurs essentially 
at the same locus in the genome of another or second isolate, and that, due to natural variation caused by, for example, mutation 
or recombination, has a similar but not identical nucleic acid sequence. A coding region allelic variant typically encodes a protein 
having similar activity to that of the protein encoded by the gene to which it is being compared. An allelic variant can also 

15 comprise an alteration in the 5' or 3' untranslated regions of the gene, such as in regulatory control regions ^g. see US patent 
5,753,235). 
Expression systems 

The streptococcus nucleotide sequences can be expressed in a variety of different expression systems; for example those used 
with mammalian cells, baculoviruses, plants, bacteria, and yeast. 
20 i. Mammalian Systems 

Mammalian expression systems are known in the art. A mammalian promoter is any DNA sequence capable of binding 
mammalian RNA polymerase and initiating the downstream (3') transcription of a coding sequence (g. structural gene) into 
mRNA. A promoter will have a transcription initiating region, which is usually placed proximal to the 5' end of the coding 
sequence, and a TATA box, usually located 25-30 base pairs (bp) upstream of the transcription initiation site. The TATA box is 
25 thought to direct RNA polymerase II to begin RNA synthesis at the correct site. A mammalian promoter will also contain an 
upstream promoter element, usually located within 100 to 200 bp upstream of the TATA box. An upstream promoter element 
determines the rate at which transcription is initiated and can act in either orientation [Sambrook et al. (1989) "Expression of 
Cloned Genes in Mammalian Cells." In Molecular Cloning: A Laboratory Manual, 2nd ed. ]. 

Mammalian viral genes are often highly expressed and have a broad host range; therefore sequences encoding mammalian viral 

30 genes provide particularly useful promoter sequences. Examples include the SV40 early promoter, mouse mammary tumor vims 
LTR promoter, adenovirus major late promoter (Ad MLP). and herpes simplex virus promoter. In addition, sequences derived 
from non-viral genes, such as the murine metallotheionein gene, also provide useful promoter sequences. Expression may be either 
constitutive or regulated (inducible), depending on the promoter can be induced with glucocorticoid in hormone-responsive cells. 
The presence of an enhancer element (enhancer), combined with the promoter elements described above, will usually increase 

35 expression levels. An enhancer is a regulatory DNA sequence that can stimulate transcription up to 1000-fold when linked to 
homologous or heterologous promoters, with synthesis beginning at the normal RNA start site. Enhancers are also active when 
they are placed upstream or downstream from the transcription initiation site, in either normal or flipped oriertation, or at a 
distance of more than 1000 nucleotides from the promoter [Maniatis et al. (1987) Science 236:1221; Alberts et al. (1989) 
Molecular Biology of the Cell, 2nd ed.]. Enhancer elements derived from viruses may be particularly useful, because they 

40 usually have a broader host range. Examples include the SV40 early gene enhancer |Dijkema et al (1985) EMBO J. 4:761] and 
the enhancer/promoters derived from the long terminal repeat (LTR) of the Rous Sarcoma Virus [Gorman et al. (1982b) Proc. 
Natl. Acad. Sci. 79:6111] and from human cytomegalovirus [Boshait et al. (1985) Cell 41:521]. Additionally, some enhancers 
are regulatable and become active only in the presence of an inducer, such as a hormone or metal ion [Sassone-Corsi and Borel 
(1986) Trends Genet. 2:215; Maniatis et al. (1987) Science 236:1237]. 

45 A DNA molecule may be expressed intracellularly in mammalian cells. A promoter sequence may be directly linked with the 
DNA molecule, in which case the first amino acid at the N-terrninus of the recorrbinant protein will always be a methionine, which 
is encoded by the ATG start codon. If desired, the N-tenninus may be cleaved from the protein by in vitro incubation with 
cyanogen bromide. 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating chimeric DNA molecules that 
50 encode a fusion protein comprised of a leader sequence fragment that provides for secretion of the foreign protein in mammalian 
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cells. Preferably, there are processing sites encoded between the leader fragment and the foreign gene that can be cleaved either 
in vivo or in vitro. The leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids which 
direct the secretion of the protein from the cell. The adenovirus triparite leader is an example of a leader sequence that provides 
for secretion of a foreign protein in mammalian cells. 
5 Usually, transcription termination and polyadenylation sequences recognized by mammalian cells are regulatory regions located 3' 
to the translation stop codon and thus, together with the promoter elements, flank the coding sequence. The 3' terminus of the 
mature mRNA is formed by site-specific post-transcriptional cleavage and polyadenylation [Birnstiel et al. (1985) Cell 41:349; 
Proudfoot and Whitelaw (1988) 'Termination and 3' end processing of eukaryotic ENA In Transcription and splicing (ed. 
B.D. Hames andD.M. Glover); Proudfoot (1989) Trends Biochem. Sci. 74:105]. These sequences direct the transcription of an 
1 0 mRNA which can be translated into the polypeptide encoded by the DNA. Examples of transcription terrninater/polyadenylation 
signals include those derived from SV40 [Sambrook et al (1989) "Expression of cloned genes in cultured mammalian cells." In 
Molecular Cloning: A Laboratory Manual]. 

Usually, the above described components, comprising a promoter, polyadenylation signal, and transcription termination sequence 
are put together into expression constructs. Enhancers, introns with functional splice donor and acceptor sites, and leader 

15 sequences may also be included in an expression construct, if desired. Expression constructs are often maintained in a replicon, 
such as an extrachromosomal element <$g. plasmids) capable of stable maintenance in a host, such as mammaHan cells or 
bacteria. Mammalian replication systems include those derived from animal viruses, which require trans-acting factors to replicate. 
For example, plasmids containing the replication systems of papovaviruses, such as SV40 [Gluzrnan (1981) Cell 25:175] or 
pdyoniavirus, replicate to extremely high copy number in the presence of the appropriate viral T antigen. Additional examples of 

20 mammalian replicons include those derived from bovine papillomaviius and Epstein-Barr vims. Additionally, the replicon may 
have two replicaton systems, thus allowing it to be maintained, for example, in mammalian cells for expression and in a 
prokaryotic host for cloning and amplification. Examples of such mammaliaii-bacteria shuttle vectors include pMT2 [Kaufman et 
al. (1989) Mol. Cell. Biol. 9:946] andpHEBO [Shimizu et al. (1986) Mol. Cell. Biol. (5:1074]. 

The transfomiation procedure used depends upon the host to be transformed. Methods for introduction of heterologous 
25 polynucleotides into mammalian cells are known in the art and include dextran-mediated transfection, calcium phosphate 
precipitation, polybrene mediated hansfection, protoplast fusion, electroporation, encapsulation of the polynucleotide^) in 
liposomes, and direct microinjection of the DNA into nuclei. 

Mammalian cell lines available as hosts for expression are known in the art and include many immortalized cell lines avaihble from 
the American Type Culture Collection (ATCC), including but not limited to, Chinese hamster ovary (CHO) cells, HeLa cells, 
30 baby hamster kidney (BHK) cells, monkey kidney cells (COS), human hepatocellular carcinoma cells <eg. Hep G2), and a 
number of other cell lines. 
ii. Baculovirus Systems 

The polynucleotide encoding the protein can also be inserted into a suitable insect expression vector, and is operably linked to the 
control elements within that vector. Vector construction employs techniques which are known in the art. Generally, the 
35 components of the expression system include a transfer vector, usually a bacterial plasmid, which contains both a fragment of the 
baculovirus genome, and a convenient restriction site for insertion of the heterologous gene or genes to be expressed; a wild type 
baculovirus with a sequence homologous to the baculovirus-specific fragment in the transfer vector (this allows for the 
homologous recombination of the heterologous gene in to the baculovirus genome); and appropriate insect host cells and growth 
media. 

40 After inserting the DNA sequence encoding the protein into the transfer vector, the vector and the wild type viral genome are 
transfected into an insect host cell where the vector and viral genome are allowed to recombine. The packaged recombinant virus 
is expressed and recombinant plaques are identified and purified. Materials and methods for baculovirus/insect cell expression 
systems are commercially available in kit form from, inter alia, fnvitrogen, San Diego CA ("MaxBac" kit). These techniques are 
generally known to those skilled in the art and fully described in Summers and Smith, Texas Agricultural Experiment Station 

45 Bulletin No. 1555 (1987) (hereinafter "Summers and Smith"). 

Prior to inserting the DNA sequence encoding the protein into the baculovirus genome, the 'above described components, 
comprising a promoter, leader (if desired), coding sequence, and transcription termination sequence, are usually assembled into an 
intermediate transplacement construct (transfer vector). This may contain a single gene and operably linked regulatory elements; 
multiple genes, each with its owned set of operably linked regulatory elements; or multiple genes, regulated by the same set of 

50 regulatory elements. Intermediate transplacement constructs are often maintained in a replicon, such as an extra-chromosomal 
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element (e.g. plasmids) capable of stable maintenance in a host, such as a bacterium The replicon will have a replication system, 
thus allowing it to be maintained in a suitable host for cloning and arnplification. 

Currently, the most commonly used transfer vector for introducing foreign genes into AcNPV is pAc373. Many other vectors, 
known to those of skill in the art, have also been designed These include, for example, pVL985 (which alters the polyhedrin start 
5 codon from ATG to ATT, and which introduces a BamHI cloning site 32 basepairs downstream from the ATT; see Luckow and 
Summers, Virology (1989) 77:31. 

The plasmid usually also contains the polyhedrin polyadenylation signal (Miller et al. (\%^)Ann. Rev. Microbiol, 42:111) and a 

prokaryotic ampicillin-resistance (amp) gene and origin of replication for selection and propagation in Exoli. 

Baculovirus transfer vectors usually contain a baculovirus promoter. A baculovirus promoter is any DNA sequence capable of 

1 0 binding a baculovirus KNA polymerase and initiating the downstream (5' to 3) transcription of a coding sequence ^g. structural 
gene) into mRNA. A promoter will have a transcription initiation region which is usually placed proximal to the 5' end of the 
coding sequence. This transcription initiation region usually includes an RNA polymerase binding site and a transcription initiation 
site. A baculovirus transfer vector may also have a second domain called an enhancer, which, if present, is usually distal to the 
structural gene. Expression may be either regulated or constitutive. 

15 Structural genes, abundantly transcribed at late times in a viral infection cycle, provide particularly useful promoter sequences. 
Examples include sequences derived from the gene encoding the viral polyhedron protein, Friesen et al., (1986) "The Regulation 
of Baculovirus Gene Expression," in: The Molecular Biology ofBaculoviruses (ed. Walter Doerfler); EPO Publ. Nos. 127 839 
and 155 476; and the gene encoding the plO protein, Vlak et al, (1988),/ Gen. Virol. 69:765. 

DNA encoding suitable signal sequences can be derived from genes for secreted insect or baculovirus proteins, such as the 
20 baculovirus polyhedrin gene (Carbonell et al. (1988) Gene, 75:409). Alternatively, since the signals for mammalian cell 
posttranslational modifications (such as signal peptide cleavage, proteolytic cleavage, and phosphorylation) appear to be 
recognized by insect cells, and the signals required for secretion and nuclear accumulation also appear to be conserved between 
the invertebrate cells and vertebrate cells, leaders of non-insect origin, such as those derived from genes encoding human a- 
interferon, Maeda et al., (1985), Nature 315:592; human gastrrn-releasing peptide, Lebacq-Verheyden et al., (1988), Molec. 
25 Cell. Biol. #3129; human H-2, Smith et al., (1985) Proc. Nat'l Acad. Sci. USA, 52:8404; mouse H-3, (Miyajfma et al., 
(1987) Gene 58:213; and human glucocerebrosidase, Martin et al. {\9%$)DNA, 7:99, can also be used to provide for secretion 
in insects. 

A recombinant polypeptide or polyprotein may be expressed intracellularly or, if it is expressed with the proper regulatory 
sequences, it can be secreted. Good intracellular expression of nonfused foreign proteins usually requires heterologous genes that 

30 ideally have a short leader sequence containing suitable translation initiation signals preceding an ATG start signal. If desired, 
methionine at the N-terminus may be cleaved from the mature protein by in vitro incubation with cyanogen bromide. 
Alternatively, recombinant polyproteins or proteins which are not naturally secreted can be secreted from the insect cell by 
creating chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment that provides for 
secretion of the foreign protein in insects. The leader sequence fragment usually encodes a signal peptide comprised of 

35 hydrophobic amino acids which direct the translocation of the protein into the endoplasmic reticulum 

After insertion of the DNA sequence and/or the gene encoding the expression product precursor of the protein, an insect cell host 
is co-transformed with the heterologous DNA of the transfer vector and the genomic DNA of wild type baculovirus - usually by 
co-transfection. The promoter and transcription teimination sequence of the construct will usually comprise a 2-5kb section of the 
baculovirus genome. Methods for introducing heterologous DNA into the desired site in the baculovirus virus are known in the art. 

40 (See Summers and Smith supra; Ju et al. (1987); Smith et al, Mol. Cell. Biol. (1983) 5:2156; and Luckow and Summers 
(1989)). For example, the insertion can be into a gene such as the polyhedrin gene, by homologous double crossover 
recombination; insertion can also be into a restriction enzyme site engineered into the desired baculovirus gene. Miller et al, 
(1989), Bioessays 4:91.The DNA sequence, when cloned in place of the polyhedrin gene in the expression vector, is flanked 
both 5' and 3' by polyhedrin-specific sequences and is positioned downstream of the polyhedrin promoter. 

45 The newly formed baculovirus expression vector is subsequently packaged into an infectious recombinant baculovirus. 
Homologous recombination occurs at low frequency (between about 1% and about 5%); thus, the majority of the virus produced 
after cotransfection is still wild-type virus. Therefore, a method is necessary to identify recombinant viruses. An advantage of the 
expression system is a visual screen allowing recombinant viruses to be distinguished. The polyhedrin protein, which is produced 
by the native virus, is produced at very high levels in the nuclei of infected cells at late times after viral infectioa Accumulated 

50 polyhedrin protein forms occlusion bodies that also contain embedded particles. These occlusion bodies, up to 15 jjm in size, are 
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highly retractile, giving them a bright shiny appearance that is readily visualized under the light microscope. Cells infected with 
recombinant viruses lack occlusion bodies. To distinguish recombinant vims from wild-type virus, the transfection supernatant is 
plaqued onto a monolayer of insect cells by techniques known to those skilled in the art. Namely, the plaques are screened under 
the light microscope for the presence (indicative of wild-type virus) or absence (indicative of recombinant vims) of occlusion 
5 bodies. "Current Protocols in Microbiology" Vol. 2 (Ausubel et al. eds) at 16.8 (Supp. 10, 1990); Summers and Smith, supra; 
Miller etal. (1989). 

Recombinant baculovirus expression vectors have been developed for infection into several insect cells. For example, 
recombinant baculoviruses have been developed for, inter alia: Aedes aegypti , Autographa californica, Bombyx mori, 
Drosophila melcmogaster, Spodoptera fingiperda, and Trichophsia ni (WO 89/046699; Carbonell et al., (1985) J. Virol 
10 56153; Wright Q9M) Nature 527:718; Smith et al, (1983) M?/. Cell. Biol. 5:2156; and see generally, Fraser, et al. (1989) In 
Vitro Cell. Dev. Biol. 25:225). 

Cells and cell culture media are commercially available for both direct and fusion expression of heterologous polypeptides in a 
baculovims/expression system; cell culture technology is generally known to those skilled in the art. See, eg. Summers and Smith 
supra. 

15 The modified insect cells may then be grown in an appropriate nutrient medium, which allows for stable maintenance of the 
plasmid(s) present in the modified insect host. Where the expression product gene is under inducible control, the host may be 
grown to high density, and expression induced. Alternatively, where expression is constitutive, the product will be continuously 
expressed into the medium and the nutrient medium must be continuously circulated, while removing the product of interest and 
augmenting depleted nutrients. The product may be purified by such techniques as chromatography, eg. HPLC, affinity 

20 chromatography, ion exchange chromatography, etc. ; electrophoresis; density gradient centrifugation; solvent extraction, etc. As 
appropriate, the product may be further purified, as required, so as to remove substantially any insect proteins which are also 
present in the medium, so as to provide a product which is at least substantially free of host debris, eg. proteins, Epids and 
polysaccharides. 

hi order to obtain protein expression, recombinant host cells derived from the transformants are incubated under conditions which 
25 allow expression of the recombinant protein encoding sequence. These conditions will vary, dependent upon the host cell 
selected. However, the conditions are readily ascertainable to those of ordinary skill in the art, based upon what is known in the 
art. 

iii. Plant Systems 

There are many plant cell culture and whole plant genetic expression systems known in the art. Exemplary plant cellular genetic 
30 expression systems include those described in patents, such as: US 5,693,506; US 5,659,122; and US 5,608,143. Additional 
examples of genetic expression in plant cell culture has been described by Zenk Phytochemistry 30:3861-3863 (1991). 
Descriptions of plant protein signal peptides may be found in addition to the references described above in Vaulcombe et al,M?/. 
Gen. Genet. 209:3340 (1987); Chandler et al. Plant Molecular Biolog)> 3:407418 (1984); Rogers, J. Biol. Chan. 
260:3731-3738 (1985); Rothstein et al. Gene 55:353-356 (1987); WbMer et al. Nucleic Acids Research 15:2515-2535 
35 (1987); Wirsel et al. Molecular Microbiology 3:3-14 (1989); Yu et al. Gene mMI-TSS (1992). A description of the 
regulation of plant gene expression by the phytohormone, gibberellic acid and secreted enzymes induced by gibberelhc acid can 
be found in RL. Jones and J. MacMillin, Gibberellins: in: Advanced Plant Physiology,. Malcolm B. Wilkins, ed, 1984 Pitman 
^bushing Limited, London, pp. 21-52. References that describe other metabolically-regulated genes: Sheen, Plant Cell, 
2:1027-1038(1990); Maas et al, EMBO J. 9:3447-3452 (1990); Benkel and Hickey, Proc. Natl Acad. ScL 84:1337-1339 
40 (1987). 

Typically, using techniques known in the art, a desired polynucleotide sequence is inserted into an expression cassette comprising 
genetic regulatory elements designed for operation in plants. The expression cassette is inserted into a desired expression vector 
with companion sequences upstream and downstream from the expression cassette suitable for expression in a plant host. The 
companion sequences will be of plasmid or viral origin and provide necessary characteristics to the vector to permit the vectors to 

45 move DNA from an original cloning host, such as bacteria, to the desired plant host. The basic bacterial/plant vector construct will 
preferably provide a broad host range prokaryote replication origin; a prokaryote selectable marker; and, for Agrobacterium 
transformations, T DNA sequences for Agrobacterium-mediated transfer to plant chromosomes. Where the heterologous gene is 
not readily amenable to detection, the construct will preferably also have a selectable marker gene suitable for determining if a 
plant cell has been transformed. A general review of suitable markers, for example for the members of the grass family, is found in 

50 Wik^mdI)o^,W3,PlantMol.Biol.Reptr,U(2y.l65-lS5. 
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Sequences suitable for permitting integration of the heterologous sequence into the plant genome are also recommended These 
might include transposon sequences and the like for homologous recombination as weE as Ti sequences which permit random 
insertion of a heterologous expression cassette into a plant genome. Suitable prokaryote selectable markers include resistance 
toward antibiotics such as ampicillin or tetracycline. Other DNA sequences encoding additional functions may also be present in 
5 the vector, as is known in the art 

The nucleic acid molecules of the subject invention may be included into an expression cassette for expression of the protein(s) of 
interest. Usually, there will be only one expression cassette, although two or more are feasible. The recombinant expression 
cassette will contain in addition to the heterologous protein encoding sequence the following elements, a promoter region, plant 5' 
untranslated sequences, initiation codon depending upon whether or not the structural gene comes equipped with one, and a 
10 transcription and translation termination sequence. Unique restriction enzyme sites at the 5' and 3' ends of the cassette allow for 
easy insertion into a pre-existing vector. 

A heterologous coding sequence may be for any protein relating to the present invention. The sequence encoding the protein of 
interest will encode a signal peptide which allows processing and translocation of the protein, as appropriate, and will usually lack 
any sequence which might result in the binding of the desired protein of the invention to a membrane. Since, for the most part, the 

15 transcriptional initiation region will be for a gene which is expressed and translocated during germination, by employing the signal 
peptide which provides for translocation, one may also provide for translocation of the protein of interest. In this way, the 
protein(s) of interest will be translocated from the cells in which they are expressed and may be efficiently harvested. Typically 
secretion in seeds are across the aleurone or scutellar epithelium layer into the endosperm of the seed. While it is not required that 
the protein be secreted from the cells in which the protein is produced, this facilitates the isolation and purification of the 

20 recombinant protein. 

Since the dtimate expression of the desired gene product will be in a eucaryotic cell it is desirable to determine whether any 
portion of the cloned gene contains sequences which will be processed out as introns by the host's splicosome machinery. If so, 
site-directed mutagenesis of the "intron" region may be conducted to prevent losing a portion of the genetic message as a false 
intron code, Reed and Maniatis, Cell 41:95-105, 1985. 

25 The vector can be microinjected directly into plant cells by use of micropipettes to mechanically transfer the recombinant DNA. 
Crossway, Mol. Gen. Genet, 202:179-185, 1985. The genetic material may also be transferred into the plant cell by using 
polyethylene glycol, Krens, et al., Nature, 296, 72-74, 1982. Another melhod of introduction of nucleic acid segments is high 
velocity ballistic penetration by small particles with the nucleic acid either within the matrix of small beads or particles, or on the 
surface, Klein, et al., Nature, 327, 70-73, 1987 and Knudsen and Mullet, 1991, Planta, 185:330-336 teaching particle 

30 bombardment of barley endosperm to create transgenic barley, Yet another method of introduction would be fusion of protoplasts 
with other entities, either minicells, cells, lysosomes or other fusible lipid-surfaced bodies, Fraley, et al., Proc. Natl. Acad. Sci. 
USA, 79, 1859-1863, 1982. 

The vector may also be introduced into the plant cells by electioporation. (Fromm et al, Proc. Natl Acad, Sci. USA 82:5824, 
1985). In this technique, plant protoplasts are electroporated in the presence of plasmids containing the gene construct. Electrical 
35 impulses of high field strength reversibly permeabilize biomembranes allowing the introduction of the plasmids. Electroporated 
plant protoplasts reform the cell wall, divide, and form plant callus. 

All plants from which protoplasts can be isolated and cultured to give whole regenerated plants can be transformed by the present 
invention so that whole plants are recovered which contain the transfened gene. It is known that practically all plants can be 
regenerated from cultured cells or tissues, including but not limited to all major species of sugarcane, sugar beet, cotton, fruit and 

40 other trees, legumes and vegetables. Some suitable plants include, for example, species from the genera Fragaria, Lotus, 
Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, Geranium, Manihot, Daucus, Arabidopsis, Brassica, 
Raphanus, Sinapis, Atropa, Capsicum, Datura, Hyoscyamus, Lycopersion, Nicotiana, Solanum, Petunia, Digitalis, 
Majorana, Cichorium, Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, Nemesia, Pelargonium, 
Panicum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, Glycine, Lolium, lea, Triticum, Sorghum, 

45 mi Datura. 

Means for regeneration vary from species to species of plants, but generally a suspension of transformed protoplasts containing 
copies of the heterologous gene is first provided. Callus tissue is formed and shoots may be induced from callus and subsequently 
rooted. Alternatively, embryo formation can be induced from the protoplast suspension. These embryos germinate as natural 
embryos to form plants. The culture media will generally contain various amino acids and hormones, such as auxin and cytoldnins. 
50 It is also advantageous to add glutamic acid and proline to the medium, especially for such species as com and alfalfa. Shoots and 
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roots normally develop simultaneously. Efficient regeneration will depend on the medium, on the genotype, and on the history of 
the culture. If these three variables are controlled, then regeneration is fully reproducible and repeatable. 
In some plant cell culture systems, the desired protein of the invention may be excreted or alternatively, the protein may be 
extracted from the whole plant. Where the desired protein of the invention is secreted into the medium, it may be collected. 
5 Alternatively, the embryos and embryoless-half seeds or other plant tissue may be mechanically disrupted to release any secreted 
protein between cells and tissues. The mixture may be suspended in a buffer solution to retrieve soluble proteins. Conventional 
protein isolation and purification methods will be then used to purify the recombinant protein. Parameters of time, temperature pH, 
oxygen, and volumes will be adjusted through routine methods to optimize expression and recovery of heterologous protein 
iv. Bacterial Systems 

10 Bacterial expression techniques are known in the art. A bacterial promoter is any DNA sequence capable of binding bacterial 
RNA polymerase and initiating the downstream (3') transcription of a coding sequence fg. structural gene) into mRNA. A 
promoter will have a transcription initiation region which is usually placed proximal to the 5' end of the coding sequence. This 
transcription initiation region usually includes an RNA polymerase binding site and a transcription initiation site. A bacterial 
promoter may also have a second domain called an operator, that may overlap an adjacent RNA polymerase binding site at 

15 which RNA synthesis begins. Hie operator permits negative regulated (inducible) transcription, as a gene repressor protein may 
bind the operator and thereby inhibit transcription of a specific gene. Constitutive expression may occur in the absence of negative 
regulatory elements, such as the operator. In addition, positive regulation may be achieved by a gene activator protein binding 
sequence, which, if present is usually proximal (5 1 ) to the RNA polymerase binding sequence. An example of a gene activator 
protein is the catabolite activator protein (CAP), which helps initiate transcription of the lac operon in Escherichia coli iE.coli) 

20 [Raibaud et al. (1984) Annu. Rev. Genet. 75:173]. Regulated expression may therefore be either positive or negative, thereby 
either enhancing or reducing transcription. 

Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences. Examples include promoter 
sequences derived from sugar metabolizing enzymes, such as galactose, lactose (7ac) [Chang et al. (1977) Nature 795:1056], 
and maltose. Additional examples include promoter sequences derived from biosynthetic enzymes such as tryptophan (trp) 
25 [Goeddel et al. (1980) Nuc. Acids Res. 5:4057; Yelverton et al. (1981) Nucl. Acids Res. 9:731; US patent 4,738,921; EP-A- 
0036776 and EP-A-01 21775]. The g-laotamase (bid) promoter system [Weissmann (1981) "The cloning of interferon and other 
mistakes." In Interferon 3 (ed. I. Gresser)], bacteriophage lambda PL [Shimatake et al. (1981) Nature 292:12%} and T5 [US 
patent 4,689,406] promoter systems also provide useful promoter sequences. 

In addition, synthetic promoters which do not occur in nature also function as bacterial promoters. For example, transcription 
30 activation sequences of one bacterial or bacteriophage promoter may be joined with the operon sequences of another bacterial or 
bacteriophage promoter, creating a synthetic hybrid promoter [US patent 4,551,433]. For example, the tac promoter is a hybrid 
trp-lac promoter comprised of both trp promoter and lac operon sequences that is regulated by the lac repressor [Amann et al. 
(1983) Gene 25:167; de Boer et al. (1983) Proc. Natl. Acad. Sci. 50:21]. Furthermore, a bacterial promoter can include 
naturally occurring promoters of non-bacterial origin that have the ability to bind bacterial RNA polymerase and initiate 
35 transcription. A naturally occurring promoter of non-bacterial origin can also be coupled with a compatible RNA polymerase to 
produce high levels of expression of some genes in prokaryotes. The bacteriophage T7 RNA polymerase/promoter system is an 
example of a coupled promoter system [Studieretf al. (1986)1 Mol. Biol. 759:113; Tabor et al. (1985) Proc Natl. Acad. Sci. 
52:1074]. In addition, a hybrid promoter can also be comprised of a bacteriophage promoter and an E.coli operator region 
(EPO-A-0 267 851). 

40 In addition to a functioning promoter sequence, an efficient ribosome binding site is also useful for the expression of foreign genes 
in prokaryotes. In E.coli, the ribosome binding site is called the Shine-Dalgamo (SD) sequence and includes an initiation codon 
(ATG) and a sequence 3-9 nucleotides in length located 3-11 nucleotides upstream of the initiation codon [Shine et al. (1975) 
Nature 254:34]. The SD sequence is thought to promote binding of mRNA to the ribosome by the pairing of bases between the 
SD sequence and the 3' and of E.coli 16S rRNA [Steitz et al. (1979) "Genetic signals and nucleotide sequences in messenger 

45 RNA." In Biological Regulation and Development: Gene Expression (ed. R.F. Goldberger)]. To express eukaryotic genes 
and prokaryotic genes with weak ribosome-binding site [Sambrook et al. (1989) "Expression of cloned genes in Escherichia 
coli." In Molecular Cloning: A Laboratory Manual]. 

A DNA molecule may be expressed intracellularly. A promoter sequence may be directly linked with the DNA molecule, in which 
case the first amino acid at the N-tenninus will always be a methiorrine, which is encoded by the ATG start codon If desired, 
50 methionine at the N-terminus may be cleaved from the protein by in vitro incubation with cyanogen bromide or by either in vivo 
on in vitro incubation with a bacterial methionine N-terminal peptidase (EP-A-0 219 237). 
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Fusion proteins provide an alternative to direct expression. Usually, a DNA sequence encoding the N-terminal portion of an 
endogenous bacterial protein, or other stable protein, is fused to the 5' end of heterologous coding sequences. Upon expression, 
this construct will provide a fusion of the two amino acid sequences. For example, the bacteriophage lambda cell gene can be 
linked at the 5' terminus of a foreign gene and expressed in bacteria. The resulting fusion protein preferably retains a site for a 
5 processing enzyme (factor Xa) to cleave the bacteriophage protein from the foreign gene [Nagai et al. (1984) Nature 309:810]. 
Fusion proteins can also be made with sequences from the lacL [Jia et al (1987) Gene 60:191], trpE [Allen et al. (1987) J. 
Biotechnol. 5:93; Makoff et al. (1989) J. Gen. Microbiol 135:11], and Chey [EP-A-0 324 647] genes. The DNA sequence at 
the junction of the two amino acid sequences may or may not encode a cleavable site. Another example is a ubiquitin fusion 
protein. Such a fusion protein is made with the ubiquitin region that preferably retains a site for a processing enzyme teg. ubiquitin 
10 specific processing-protease) to cleave the ubiquitin from the foreign protein. Through this method, native foreign protein can be 
isolated [Miller et al. (1989) Bio/Technology 7:698], 

Alternatively, foreign proteins can also be secreted from the cell by creating chimeric DNA molecules that encode a fusion protein 
comprised of a signal peptide sequence fragment that provides for secretion of the foreign protein in bacteria [US patent 
4,336,336]. The signal sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids which direct 
15 the secretion of the protein from the cell. The protein is either secreted into the growth media (gram-positive bacteria) or into the 
periplasmic space, located between the inner and outer membrane of the cell (gram-negative bacteria). Preferably there are 
processing sites, which can be cleaved either in vivo or in vitro encoded between the signal peptide fragment and the foreign 
gene. 

DNA encoding suitable signal sequences can be derived from genes for secreted bacterial proteins, such as the E.coli outer 
20 membrane protein gene (pmpA) [Masui et al (1983). in: Experimental Manipulation of Gene Expression; Ghrayeb et al. 
(1984) EMBO J. 3:2437] and the E.coli alkaline phosphatase signal sequence (phoA) [Oka et al (1985) Proc. Natl. Acad. 
Sci. 82:1212]. As an additional example, the signal sequence of the alpha-amylase gene from various Bacillus strains can be used 
to secrete heterologous proteins from B. subtilis [Palva et al. (1982) Proc. Natl. Acad. Sci. USA 79:5582; EP-A-0 244 042]. 
Usually, transcription termination sequences recognized by bacteria are regulatory regions located 3' to the tnrnslation stop codon, 
25 and thus together with the promoter flank the coding sequence. These sequences direct the transcription of an mRNA which can 
be translated into the polypeptide encoded by the DNA. Transcription tenninalion sequences frequently include DNA sequences 
of about 50 nucleotides capable of forming stem loop structures that aid in termirating transcription. Examples include 
transcription termination sequences derived from genes with strong promoters, such as the tip gene in E.coli as well as other 
biosynthetic genes. 

30 Usually, the above described components, comprising a promoter, signal sequence (if desired), coding sequence of interest, and 
transcription termination sequence, are put together into expression constructs. Expression constructs are often mairtained in a 
replicon, such as an extracliromosomal element (eg. plasmids) capable of stable maintenance in a host, such as bacteria. The 
replicon will have a replication system, thus allowing it to be maintained in a prokaryotic host either for expression or for cloning 
and amplificatioa In addition, a replicon may be either a high or low copy number plasmid. A high copy number plasmid will 

35 generally have a copy number ranging from about 5 to about 200, and usually about 10 to about 150, A host containing a high 
copy number plasmid will preferably contain at least about 10, and more preferably at least about 20 plasmids. Either a high or 
low copy number vector may be selected, depending upon the effect of the vector and the foreign protein on the host. 
Alternatively, the expression constructs can be integrated into the bacterial genome with an integrating vector. Integrating vectors 
usually contain at least one sequence homologous to the bacterial chromosome that allows the vector to integrate. Integrations 

40 appear to result from recombinations between homologous DNA in the vector and the bacterial chromosome. For example, inte- 
grating vectors constructed with DNA from various Bacillus strains integrate into the Bacillus chromosome (EP-A- 0 127 328). 
Integrating vectors may also be comprised of bacteriophage or transposon sequences. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers to allow for the selection of 
bacterial strains that have been transformed. Selectable markers can be expressed in the bacterial host and may include genes 
45 which render bacteria resistant to drags such as ampicillin, chloramphenicol, erythromycin, kanamycin (neomycin), and 
tetracycline [Davies et al. (1978) Annu. Rev. Microbiol 32:469]. Selectable markers may also include biosynthetic genes, such 
as those in the histidine, tryptophan, and leucine biosynlhetic pathways. 

Alternatively, some of the above described components can be put together in transformation vectors. Transformation vectors are 
usually comprised of a selectable market that is either rnaintained in a replicon or developed into an integrating vector, as 
50 described above. 
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Expression and transformation vectors, either extra-chromosomal repEcons or rntegrating vectors, have been developed for 
transformation into many bacteria. For example, expression vectors have been developed for, inter alia, the following bacteria: 
Bacillus subtiEs [Palva et al. (1982) Proc. Natl. Acad. Sci. USA 79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 
84/04541], Escherichia coE [Shimatake et al. (1981) Nature 292-X2%; Amann et al. (1985) Gene 40:183; Studier et al. (1986) 
5 J. Mol. Biol. 759:113; EP-A-0 036 776,EP-A-0 136 829 and EP-A-0 136 907], Streptococcus cremoris [Powell et al. (1988) 
Appl. Environ. Microbiol. 54:655]; Streptococcus hvidans [Powell et al. (1988) Appl. Environ. Microbiol. 54:655], 
Streptomyces lividans [US patent 4,745,056]. 

Methods of introducing exogenous DNA into bacterial hosts are well-known in the art, and usually include either the 
transformation of bacteria treated with CaQ or other agents, such as divalent cations and DMSO. DNA can also be introduced 

10 into bacterial cells by electroporation. Transformation procedures usually vary with the bacterial species to be transformed. See 
eg. [Masson et al (1989) FEMS Microbiol. Lett. 60:273; Palva et al. (1982) Proc. Natl. Acad. Sci. USA 79:5582; EP-A-0 
036 259 and EP-A-0 063 953; WO 84/04541, Bacillus], [Miller et al. (1988) Proc. Natl. Acad. Sci. 55:856; Wang et al. 
(1990) J. Bacteriol 172:949, Campylobacter], [Cohen et al. (1973) Proc. Natl. Acad. Sci. (59:2110; Dower et al. (1988) 
Nucleic Acids Res. 16:6121; Kusliner (1978) "An improved method for transformation of Escherichia coE with ColEl-derived 

15 plasmids. In Genetic Engineering: Proceedings of the International Symposium on Genetic Engineering (eds. H.W. Boyer 
and S. Nicosia); Mandel et al. (1970) J. Mol, Biol. 53:159; Taketo (1988) Biochim. Biophys. Acta 949:318; Escherichia], 
[Chassy et al. (mi) FEMS Microbiol. Lett. 44:113 LactobaciEus]; [Fiedler et al. (1988) Anal. Biochem 770:38, 
Pseudomonas]; [Augustin et al, (1990) FEMS Microbiol Lett. 66:203, Staphylococcus], [Barany et al. (1980) J. Bacteriol. 
144:69%; Harlander (1987) "Transformation of Streptococcus lactis by electroporation, in: Streptococcal Genetics (ed. J. 

20 Ferretti and R. Curtiss HI); Perry et al. (1981) Infect. Immun. 32:1295; PoweE et al. (1988) Appl. Environ. Microbiol. 
54:655; Somkuti et al. (mi) Proc. 4th Evr. Cong. Biotechnology 7:412, Streptococcus], 
v. Yeast Expression 

Yeast expression systems are also known to one of ordinary skill in the art. A yeast promoter is any DNA sequence capa ble of 
binding yeast RNA polymerase and initiating the downstream (3') transcription of a coding sequence (eg. stmctural gene) into 

25 mRNA A promoter will have a transcription initiation region which is usuaEy placed proximal to the 5' end of the coding 
sequence. This transcription initiation region usually includes an RNA polymerase binding site (the "TATA Box") and a 
transcription initiation site. A yeast promoter may also have a second domain caUed an upstream activator sequence (UAS), 
which, if present, is usuaEy distal to the structural gene. The UAS permits regulated (inducible) expression. Constitutive expression 
occurs in the absence of a UAS. Regulated expression may be either positive or negative, thereby either enhancing or reducing 

30 transcription 

Yeast is a fermenting organism with an active metabolic pathway, therefore sequences encoding enzymes in the metaboEc 
pathway provide particularly useful promoter sequences. Examples include alcohol dehydrogenase (ADH) (EP-A-0 284 044), 
enolase, glucokinase, glucose-6-phosphate isomerase, glyceraldehyde-3-phosphate-dehydrogenase (GAP or GAPDH), 
hexokrnase, phosphofructokinase, 3-phosphoglycerate mutase, and pyruvate kinase (PyK) (EPO-A-0 329 203). The yeast 
35 PH05 gene, encoding acid phosphatase, also provides useful promoter sequences [Myanohara et al. (1983) Proc. Natl. Acad. 
Sci. USA 80:1]. 

In addition, synthetic promoters which do not occur in nature also function as yeast promoters. For example, UAS sequences of 
one yeast promoter may be joined with the transcription activation region of another yeast promoter, creating a synthetic hybrid 
promoter. Examples of such hybrid promoters include the ADH regulatory sequence linked to the GAP transcription activation 

40 region (US Patent Nos. 4,876,197 and 4,880,734). Other examples of hybrid promoters include promoters which consist of the 
regulatory sequences of either the ADH2, GAL4, GAL10, OR PH05 genes, combined with the transcriptional activation region 
of a glycolytic enzyme gene such as GAP or PyK (EP-A-0 164 556). Furthermore, a yeast promoter can include naturafly 
occurring promoters of non-yeast origin that have the ability to bind yeast RNA polymerase and initiate transcription Examples of 
such promoters include, inter alia, [Cohen et al. (1980) Proa Natl. Acad. Sci. USA 77:1078; Henikoff et al. (1981) Nature 

45 283:835; Hollenberg et al. (1981) Curr. Topics Microbiol Immunol. 96:119; HoEenberg et al. (1979) "The Expression of 
Bacterial Antibiotic Resistance Genes in the Yeast Saccharomyces cerevisiae," in: Plasmids of Medical, Environmental and 
Commercial Importance (eds. K.N. Timmis and A Punier); Mercerau-Puigalon et al. (1980) Gene 77:163; Panthier et al. 
(1980) Curr. Genet. 2:109;]. 

A DNA molecule may be expressed intraceEularly in yeast A promoter sequence may be directly linked with the DNA molecule, 
50 in which case the first amino acid at the N-terminus of the recombinant protein wiE always be a methionine, which is encoded by 
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the ATG start codon. If desired, methionine at the N-teiminus may be cleaved from the protein by in vitro incubation with 
cyanogen bromide. 

Fusion proteins provide an alternative for yeast expression systems, as well as in mammalian, baculovirus, and bacterial 
expression systems. Usually, a DNA sequence encoding the N-terrrirnal portion of an endogenous yeast protein, or other stable 
5 protein, is fused to the 5' end of heterologous coding sequences. Upon expression, this construct will provide a fusion of the two 
amino acid sequences. For example, the yeast or human superoxide dismutase (SOD) gene, can be linked at the 5' terminus of a 
foreign gene and expressed in yeast. The DNA sequence at the junction of the two amino acid sequences may or may not encode 
a cleavable site. See eg. EP-A-0 196 056. Another example is a ubiquitin fusion protein Such a fusion protein is made with the 
ubiquitin region that preferably retains a site for a processing enzyme leg. ubiquitin-specific processing protease) to cleave the 

10 ubiquitin from the foreign protein Through this method, therefore, native foreign protein can be isolated (eg. WO88/024066). 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating chimeric DNA molecules that 
encode a fusion protein comprised of a leader sequence fragment that provide for secretion in yeast of the foreign protein. 
Preferably, there are processing sites encoded between the leader fragment and the foreign gene that can be cleaved either/?! vivo 
or in vitro. The leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids which direct 

15 the secretion of the protein from the cell. 

DNA encoding suitable signal sequences can be derived from genes for secreted yeast proteins, such as the yeast invertase gene 
(EP-A-0 012 873; JPO. 62,096,086) and the A-factor gene (US patent 4,588,684). Alternatively, leaders of non-yeast origin, 
such as an interferon leader, exist that also provide for secretion in yeast (EP-A-0 060 057). 

A preferred class of secretion leaders are those that employ a fragment of the yeast alpha-factor gene, which contains both a "pre" 
20 signal sequence, and a "pro" region. The types of alpha-factor fragments that can be employed include the full-length pre-pro 
alpha factor leader (about 83 amino acid residues) as well as truncated alpha-factor leaders (usually about 25 to about 50 amino 
acid residues) (US Patents 4,546,083 and 4,870,008; EP-A-0 324 274). Additional leaders employing an alpha-factor leader 
fragment that provides for secretion include hybrid alpha-factor leaders made with a presequence of a first yeast, but a pro-region 
from a second yeast alphafactor. (eg. see WO 89/02463.) 
25 Usually, transcription teimination sequences recognized by yeast are regulatory regions located 3' to the translation stop codon, 
and thus together with the promoter flank the coding sequence. These sequences direct the transcription of an mRNA which can 
be translated into the polypeptide encoded by the DNA Examples of transcription terminator sequence and other yeast- 
recognized termination sequences, such as those coding for glycolytic enzymes. 

Usually, the above described components, comprising a promoter, leader (if desired), coding sequence of interest, and 
30 transcription termination sequence, are put together into expression constructs. Expression constructs are often maintained in a 
replicon, such as an extrachromosomal element (eg. plasmids) capable of stable maintenance in a host, such as yeast or bacteria. 
The replicon may have two replication systems, thus allowing it to be maintained, for example, in yeast for expression and in a 
prokaryotic host for cloning and amplification Examples of such yeast-bacteria shuttle vectors include YEp24 [Botstein et al. 
(1979) Gene 5:17-24], pCl/1 [Brake et al. (1984) Proc. Natl. Acad. Sci USA 57:4642-4646], and YRpl7 [Stinchcomb et al. 
35 (1982) J. Mol. Biol. 755:157]. In addition, a replicon may be either a high or low copy number plasmid. A high copy number 
plasmid will generally have a copy number ranging from about 5 to about 200, and usually about 10 to about 150. A host 
containing a high copy number plasmid will preferably have at least about 10, and more preferably at least about 20. Enter a high 
or low copy number vector may be selected, depending upon the effect of the vector and the foreign protein on the host. See eg. 
Brake et al, supra. 

40 Alternatively, the expression constructs can be integrated into the yeast genome with an integrating vector. Integrating vectors 
usually contain at least one sequence homologous to a yeast chromosome that allows the vector to integrate, and preferably 
contain two homologous sequences flanking the expression construct. Integrations appear to result from recombinations between 
homologous DNA in the vector and the yeast chromosome [Orr-Weaver et al (1983) Methods in Enzymol. 1 07:228-245]. An 
integrating vector may be directed to a specific locus in yeast by selecting the appropriate homologous sequence for inclusion in 

45 the vector. See Orr-Weaver et al, supra. One or more expression construct may integrate, possibly affecting levels of 
recombinant protein produced \Rkieet al. (1983) Proc. Natl. Acad. Sci. USA 80:6150]. The chromosomal sequences included 
in the vector can occur either as a single segment in the vector, which results in the integration of the entire vector, or two 
segments homologous to adjacent segments in the chromosome and flanking the expression construct in the vector, which can 
result in the stable integration of only the expression coristruct 
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Usually, extrachromosomal and integrating expression constructs may contain selectable markers to allow for the selection of 
yeast strains that have been transformed Selectable markers may include biosynthetic genes that can be expressed in the yeast 
host, such as ADE2, HIS4, LEU2, TRP1, and ALG7, and the G418 resistance gene, which confer resistance in yeast cells to 
tunicamycin and G418, respectively. In addition, a suitable selectable marker may also provide yeast with the ability to grow in the 
5 presence of toxic compounds, such as metal. For example, the presence of CUP I allows yeast to grow in the presence of copper 
ions [Butte/ al. (\%7) Microbiol Rev. 51351]. 

Alternatively, some of the above described components can be put together into transformation vectors. Transformation vectors 
are usually comprised of a selectable marker that is either maintained in a replicon or developed into an integrating vector, as 
described above. 

10 Expression and transformation vectors, either extachromosomal replicons or integrating vectors, have been developed for 
transformation into many yeasts. For example, expression vectors have been developed for, inter alia, the following 
yeasts:Candida albicans [Kurtz, et al. (1986) Mol Cell. Biol. 6:142], Candida maltosa [Kunze, et al. (1985) J. Basic 
Microbiol. 25:141]. Hansenula polymorphs [Gleeson, et al. (1986) J. Gen. Microbiol. 752:3459; Roggenkamp et al. (1986) 
Mol. Gen. Genet. 2(92:302], Kluyveromyces fhgilis [Das, et al. (1984) J. Bacteriol. 75&1165], Kluyveromyces lactis [De 

15 Louvencourt et al. (1983) J. Bacteiiol. 154:737; Van den Berg et al. (1990) Bio/Technology 8:125], Pichia guifleiimondii 
IKunze et al. (1985) J. Basic Microbiol. 25:141], Pichia pastoris [Cregg, et al. (1985) Mol. Cell. Biol 5:3376; US Patent 
Nos. 4,837,148 and 4,929,555], Saccharomyces cerevisiae [Hinnen et al. (1978) Proc. Natl. Acad. Sci, USA 75:1929; Ito et 
al. (1983) J. Bacteriol. 755:163], Schizosaccharomyces pombe [Beach and Nurse (1981) Nature 300:106], and Yarrowia 
hpolytica [Davidow, et al. (1985) Curr. Genet. 70:380471 Gaillardin, et al. (1985) Cun: Genet. 70:49]. 

20 Methods of introducing exogenous DNA into yeast hosts are well-known in die art, and usually include either the transformation 
of spheroplasts or of intact yeast cells treated with alkali cations. Transformation procedures usually vary with the yeast species to 
be transformed. See eg. [Kurtz et al (1986) Mol. Cell. Biol (5:142; Kunze et al, (1985) J. Basic Microbiol. 25: 141; Candida]; 
[Gleeson et al. (1986) J. Gen. Microbiol. 752:3459; Roggenkamp et al. (1986) Mol. Gen. Genet. 202:302; Hansenula]; [Das 
et al. (1984) J. Bacteriol. 158:1165; De Louvencourt et al. (1983) J. Bacteriol. 154:1165; Van den Berg et al. (1990) 

25 Bio/Technology 5:135; Kluyveromyces]; [Cregg et al. (1985) Mol. Cell. Biol. 5:3376; Kunze et al. (1985) J. Bask Microbiol. 
25:141; US Patent Nos. 4,837,148 and 4,929,555; Pichia]; [Hitmen e? al (1978) Proc. Natl. Acad. Sci. USA 75;1929; Ito et 
al. (1983) J. Bacteriol. 755:163 Saccharomyces]; [Beach and Nurse (1981) Nature 500:706; Schizosaccharomyces]; 
[Davidow et al. (1985) Curr. Genet. 70:39; Gaillardin et al. (1985) Curr. Genet. 70:49; Yarrowia]. 
Antibodies 

30 As used herein, the term "antibody" refers to a polypeptide or group of polypeptides composed of at least one antibody 
combining site. An "antibody combining site" is the tiiree-dimensional binding space with an internal surface shape and charge 
distribution complementary to the features of an epitope of an antigen, which allows a binding of the antibody with the antigen. 
"Antibody" includes, for example, vertebrate antibodies, hybrid antibodies, chimeric antibodies, humanised antibodies, altered 
antibodies, univalent antibodies, Fab proteins, and single domain antibodies. 

35 Antibodies against the proteins of the invention are useful for affinity chromatography, immunoassays, and d^guisriing/identrlying 
streptococcus proteins. 

Antibodies to the proteins of the invention, both polyclonal and monoclonal, may be prepared by conventional methods. In 
general, the protein is first used to immunize a suitable animal, preferably a mouse, rat, rabbit or goat. Rabbits and goats are 
preferred for the preparation of polyclonal sera due to the volume of serum obtainable, and the availability of labeled anti-rabbit 

40 and anti-goat antibodies. Immunization is generally performed by mixing or emulsifying the protein in saline, preferably in an 
adjuvant such as Freund's complete adjuvant, and injecting the mixture or emulsion parenterally (generally subcutaneously or 
intramuscularly). A dose of 50-200 [ig/injection is typically sufficient Immunization is generally boosted 2-6 weeks later with one 
or more injections of the protein in saline, preferably using Freund's incomplete adjuvant One may alternatively generate 
antibodies by in vitro immunization using methods known in the art, which for the purposes of this invention is considered 

45 equivalent to in vivo immunization. Polyclonal antisera is obtained by bleeding the immunized animal into a glass or plastic 
container, incubating the blood at 25°C for one hour, followed by incubating at 4 5 C for 2-18 hours. The serum is recovered by 
centrifugalion (eg. l,000g for 10 minutes). About 20-50 ml per bleed may be obtained from rabbits. 
Monoclonal antibodies are prepared using the standard method of Kohler & Milstein \Nature (1975) 256:495-96], or a 
modification thereof. Typically, a mouse or rat is immunized as described above. However, rather than bleeding the animal to 

50 extract serum, the spleen (and optionally several large lymph nodes) is removed and dissociated into single cells. If desired the 
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spleen cells may be screened (after removal of nonspecifically adherent cells) by applying a cell suspension to a plate or well 
coated with the protein antigen. B-cells expressing membrane-bound immunoglobulin specific for the antigen bind to the plate, and 
are not rinsed away with the rest of the suspension. Resulting B-cells, or all dissociated spleen cells, are then induced to fuse with 
myeloma cells to form hybridomas, and are cultured in a selective medium hypoxanthine, aminopterin, thymidine medium, 
''HAT"). The resulting hybridomas are plated by limiting dilution, and are assayed for production of antibodies which bind 
specifically to the immunizing antigen (and which do not bind to unrelated antigens). The selected MAb-secreting hybridomas are 
then cultured either in vitiv (eg. in tissue culture bottles or hollow fiber reactors), or in vivo (as ascites in mice). 
If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional techniques. Suitable labels 
include fluorophores, chromophores, radioactive atoms (particularly 32 P and 125 I), electron-dense reagents, enzymes, and ligands 
having specific binding partners. Enzymes are typically detected by their activity. For example, horseradish peroxidase is usually 
detected by its ability to convert 3,3',5,5'-teumiethylbenzidine (TMB) to a blue pigment, quantifiable with a spectrophotometer. 
"Specific binding partner" refers to a protein capable of binding a ligand molecule with high specificity, as for example in the case 
of an antigen and a monoclonal antibody specific therefor. Other specific binding partners include biotin and avidin or streptavidin, 
IgG and protein A, and the numerous receptor-ligand couples known in the art It should be understood that the above 
description is not meant to categorize the various labels into distinct classes, as the same label may serve in several different 
modes. For example, 125 I may serve as a radioactive label or as an electron-dense reagent. HRP may serve as enzyme or as 
antigen for a MAb. Further, one may combine various labels for desired effect. For example, MAbs and avidin also require labels 
in the practice of this invention: thus, one might label a MAb with biotin, and detect its presence with avidin labeled with 125 I, or 
with an anti-biotin MAb labeled with HRP. Other permutations and possibilities will be readily apparent to those of ordinary skill 
in the art, and are considered as equivalents within the scope of the instant invention. 
Pharmaceutical Compositions 

PhamBceutical compositions can comprise either polypeptides, antibodies, or nucleic acid of the invention. The pharmaceutical 
compositions will comprise a therapeutically effective amount of either polypeptides, antibodies, or polynucleotides of the claimed 
invention 

25 The term "therapeutically effective amount" as used herein refers to an amount of a therapeutic agent to treat, ameliorate, or 
prevent a desired disease or condition, or to exhibit a detectable therapeutic or preventative effect. The effect can be detected by, 
for example, chemical markers or antigen levels. Therapeutic effects also include reduction in physical symptoms, such as 
decreased body temperature. The precise effective amount for a subject will depend upon the subjects size and health, the nature 
and extent of the condition, and the therapeutics or combination of therapeutics selected for administration Thus, it is not useful to 

30 specify an exact effective amount in advance. However, the effective amount for a given situation can be determined by routine 
experimentation and is within the judgement of the clinician. 

For purposes of the present invention, an effective dose will be from about 0.01 mg/ kg to 50 mg/kg or 0.05 mg/kg to about 10 
mg/kg of the molecule of the invention in the individual to which it is administered 

A pharmaceutical composition can also contain a pharrnaceutically acceptable carrier. The term "pharmaceutically acceptable 
35 carrier" refers to a carrier for administration of a therapeutic agent, such as antibodies or a polypeptide, genes, and other 
therapeutic agents. The term refers to any pharmaceutical carrier that does not itself induce the production of antibodies harmful to 
the individual receiving the composition, and which may be administered without undue toxicity. Suitable carriers may be large, 
slowly metabolized maaomolecules such as proteins, polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, 
amino acid copolymers, and inactive virus particles. Such carriers are well known to those of ordinary skill in the art. 
40 Pharmaceutically acceptable salts can be used therein, for example, mineral acid salts such as hydrochlorides, hydrobromides, 
phosphates, sulfates, and the like; and the salts of organic acids such as acetates, propionates, malorrates, benzoates, and the like. 
A thorough discussion of pharrnaceutically acceptable excipients is available in Remington's Pharmaceutical Sciences (Mack Pub. 
Co., NJ. 1991). 

Pharmaceutically acceptable carriers in therapeutic compositions may contain Uquids such as water, saline, glycerol and ethanol. 
45 Additionally, auxiliary substances, such as wetting or emiilsrfyjng agents, pH buffering substances, and the like, may be present in 
such vehicles. Typically, the therapeutic compositions are prepared as injectables, either as liquid solutions or suspensions; solid 
forms suitable for solution in, or suspension in, liquid vehicles prior to injection may also be prepared Liposomes are included 
within the definition of a pharmaceutically acceptable carrier. 
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Deliverv Methods 

Once formulated, the compositions of the invention can be administered directly to the subject. Hie subjects to be treated can be 
animals; in particular, human subjects can be treated. 

Direct delivery of the compositions will generally be accomplished by injection, either subcutaneously, intrapeiitoneally, 
5 intravenously or intramuscularly or delivered to the interstitial space of a tissue. The compositions can also be administered into a 
lesion. Other modes of admiiiistration include oral and pulmonary administration, suppositories, and transdermal or transcutaneous 
applications (eg. see WO98/20734), needles, and gene guns or hyposprays. Dosage treatment may be a single dose schedule or 
a multiple dose schedule. 
Vaccines 

10 Vaccines according to the invention may either be prophylactic §e. to prevent infection) or therapeutic (ie. to treat disease after 
infection). 

Such vaccines comprise immunising antigen(s), immunogen(s), polypeptide®, protein(s) or nucleic acid, usually in combination 
with "pharmaceutically acceptable carriers," which include any carrier that does not itself induce the production of antibodies 
harmful to the individual receiving the composition. Suitable carriers are typically large, slowly metabolized macromolecules such 

1 5 as proteins, polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, lipid aggregates 
(such as oil droplets or liposomes), and inactive virus particles. Such carriers are well known to those of ordinary skill in the art. 
Additionally, these carriers may function as immunostimulating agents ("adjuvants"). Furthermore, the antigen or immunogen may 
be conjugated to a bacterial toxoid, such as a toxoid from diphtheria, tetanus, cholera, H. pylori, etc. pathogens. 
Preferred adjuvants to enhance effectiveness of the corrposition include, but are not limited to: (1) oil-in-water emulsion 

20 formulations (with or without other specific imniunostknulating agents such as muramyl peptides (see below) or bacterial cell wall 
components), such as for example (a) MF59™ (WO90/14837; Chapter 10 in Vaccine Design - the submit and adjuvant 
approach (1995) ed. Powell & Newman), containing 5% Squalene, 0.5% Tween 80, and 0.5% Span 85 (optionally containing 
MTP-PE) formulated into submicron particles using a microfluidizer, (b) SAF, containing 10% Squalane, 0.4% Tween 80, 5% 
pluronic-blocked polymer L121, and thr-MDP either microfluidized into a submicron emulsion or vortcxcd to generate a larger 

25 particle size emulsion, and (c) Rib™ adjuvant system (RAS), (Ribi Immunochem, Hamilton, MI) containing 2% Squalene, 0.2% 
Tween 80, and one or more bacterial cell wall components from the group consisting of monophosphorylipid A (MPL), trehalose 
dimycolate (TDM), and cell wall skeleton (CWS), preferably MPL + CWS (Detox™); (2) saponin adjuvants, such as QS21 or 
Stimulon™ (Cambridge Bioscience, Worcester, MA) may be used or particles generated Hierefrom such as ISCOMs 
(irmnunostimulating complexes), which ISCOMS may be devoid of additional detergent e.g. WO00/07621; (3) Complete 

30 Freund's Adjuvant (CFA) and Incomplete Freund's Adjuvant QFA); (4) cytokines, such as interleukins (e.g. IL-1, IL-2, IL-4, 
IL-5, IL-6, EL-7, IL-12 (W099/44636), etc.), interferons (e.g. gamma interferon), macrophage colony stimulating factor (M- 
CSF), tumor necrosis factor (TNF), etc; (5) monophosphoryl lipid A (MPL) or 3-O-deacylated MPL (3dMPL) e.g. GB- 
2220221, EP-A-0689454; (6) combinations of 3dMPL with, for example, QS21 and/or oil-in-water emulsions e.g. EP-A- 
0835318, EP-A-0735898, EP-A-0761231; (7) oligonucleotides comprising CpG motifs [Krieg Vaccine 2000, 19, 618-622; 

35 Krieg Curr opin Mol Ther 2001 3:15-24; Roman et al, Nat. Med., 1997, 3, 849-854; Weiner et al, PNAS USA, 1997, 94, 
10833-10837; Davis et al, J. Immunol., 1998, 160, 870-876; Chu etal, J. Exp. Med., 1997, 186, 1623-1631; Lipford et 
al, Eur. J. Immunol, 1997, 27, 2340-2344; Moldoveanu et al, Vaccine, 1988, 16, 1216-1224, Krieg et al, Nature, 1995, 
374, 546-549; Klinman et al, PNAS USA, 1996, 93, 2879-2883; Ballas et al, J. Immunol, 1996, 157, 1840-1845; 
Cowdeiy etal, J. Immunol, 1996, 156, 4570-4575; Halpern etal, Cell. Immunol, 1996, 167, 72-78; Yamamoto et al, 

40 Jpn. J. Cancer Res., 1988, 79, 866-873; Stacey et al, J. Immunol, 1996, 157, 2116-2122; Messina et al, J. Immunol, 
1991, 147, 1759-1764; Yi et al, J. Immunol, 1996, 157, 4918-4925; Yi et al, J. Immunol, 1996, 157, 5394-5402; Yi et 
al, J. Immunol, 1998, 160, 47554761; and Yi et al, J. Immunol, 1998, 160, 5898-5906; Intemational patent applications 
WO96/02555, W098/16247, WO98/18810, WO98/40100, W098/55495, W098/37919 and W098/52581] ie. containing 
at least one CG dinucleotide, with 5-methylcytosine optionally being used in place of cytosine; (8) a polyoxyethylene ether or a 

45 polyoxyelhylene ester e.g. W099/52549; (9) a polyoxyethylene sorbitan ester surfactant in combination with an octoxynol (e.g. 
WO01/21207) or a polyoxyethylene alkyl ether or ester surfactant in combination with at least one additional non-ionic surfactant 
such as an octoxynol (e.g. WO01/21152); (10) an immunostimulatory oligonucleotide (e.g. a CpG oligonucleotide) and a saponin 
e.g. WO00/62800; (11) an immunostimulant and a particle of metal salt e.g. WO00/23105; (12) a saponin and an oil-in-water 
emulsion eg. W099/11241; (13) a saponin (e.g. QS21) + 3dMPL + IL-12 (optionally + a sterol) e.g. W098/57659; (14) 

50 aluminium salts, preferably hydroxide or phosphate, but any other suitable salt may also be used (e.g. hydroxyphosphate, 
oxyhydroxide, orthophosphate, sulphate etc. [e.g. see chapters 8 & 9 of Powell & Newman]). Mixtures of different aluminium 
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salts may also be used. The salt may take any suitable form (eg. gel, crystalline, amorphous etc.); (15) other substances that act 
as irnmunostimulating agents to enhance the efficacy of the compositioa Aluminium salts and/or MF59™ are preferred. 
As mentioned above, muramyl peptides include, but are not limited to, N-aceryl-muramyl-L-1toeonyl-D4soglutamine (thr-MDP), 
N-a(^l-normmmnyl-L-alanyl-D-isoglutarnine (nor-MDP), N-acetylmuramyl-L-alanyl-D-i^ 
5 dipaMtoyl-s«-glycero-3-hyoVoxyphosphoryloxy)-ethylaniine (MTP-PE), etc. 

The immunogenic compositions (eg. the immunising an%er/iirmunogen/rx)lyr^tide/protein/ nucleic acid, pharmaceutically 
acceptable carrier, and adjuvant) typically will contain diluents, such as water, saline, glycerol, ethanol, etc. Additionally, auxiliary 
substances, such as wetting or emulsifying agents, pH buffering substances, and the like, may be present in such vehicles. 
Typically, the immunogenic compositions are prepared as injectables, either as liquid solutions or suspensions; solid forms suitable 

10 for solution in, or suspension in, liquid vehicles prior to injection may also be prepared. The preparation also may be emuhified or 
encapsulated in liposomes for enhanced adjuvant effect, as discussed above under pharmaceutically acceptable carriers. 
Immunogenic compositions used as vaccines comprise an immunologically effective amount of the antigenic or immunogenic 
polypeptides, as well as any other of the above-mentioned components, as needed. By "immunologically effective amount", it is 
meant that the aclministration of that amount to an individual, either in a single dose or as part of a series, is effective for treatmert 

1 5 or prevention. This amount varies depending upon the health and physical condition of the individual to be treated, the taxonomic 
group of individual to be treated (eg. nonhuman primate, primate, etc.), the capacity of the individual's immune system to 
synthesize antibodies, the degree of protection desired, the formulation of the vaccine, the treating doctor's assessment of the 
medical situation, and other relevant factors. It is expected that the amount will fall in a rehtively broad range that can be 
determined through routine trials. 

20 The immunogenic compositions are conventionally administered parenterally, eg. by injection, either subcutaneously, 
intramuscularly, or tainsdemially/transcutaneously (eg. WO98/20734). Additional formulations suitable for other modes of 
adrninistation include oral and pulmonary formulations, suppositories, and transdermal applications. Dosage treatment may be a 
single dose schedule or a multiple dose schedule. The vaccine may be administered in coqunction with other immvmoregulatory 
agents. 

25 As an alternative to protein-based vaccines, DNA vaccination may be used \g. Robinson & Torres (1997) Seminars in 
Immunol 9:271-283; Donnelly et al. (1997) Anrm Rev Immunol 15:617-648; later herein]. 
Gene Delivery Vehicles 

Gene therapy vehicles for delivery of constructs including a coding sequence of a therapeutic of the invention, to be delivered to 
the mammal for expression in the mammal, can be administered either locally or systemically. These constructs can utilize viral or 

30 non-viral vector approaches in in vivo or ex vivo modality. Expression of such coding sequence can be induced using endogenous 
mammalian or heterologous promoters. Expression of the coding sequence in vivo can be either constitutive or regulated. 
The invention includes gene delivery vehicles capable of expressing the contemplated nucleic acid sequences. The gene delivery 
vehicle is preferably a viral vector and, more preferably, a retroviral, adenoviral, adeno-associated viral (AAV), herpes viral, rr 
alphavirus vector. The viral vector can also be an astrovirus, coronavirus, orthomyxovirus, papovavirus, paramyxovirus, 

35 parvovirus, piairmvirus, poxvirus, or togavirus viral vector. See generally, Jolly (1994) Cancer Gene Therapy 1:51-64; Kimura 
(1994) Human Gene Therapy 5:845-852; Connelly (1995) Human Gene Therapy 6:185-193; and Kaplitt (1994) Nature 
Genetics 6:148-153. 

Retroviral vectors are well known in the art and we contemplate that any retroviral gene therapy vector is employable in the 
invention, including B, C and D type retroviruses, xenotopic retroviruses (for example, NZB-X1, NZB-X2 and NZB9-1 (see 

40 O'Neill (1985) J. Virol. 53:160) polytropic retroviruses eg. MCF and MCF-MLV (see Kelly (1983) J. Virol. 45:291), 
spumavimses and lentiviruses. See RNA Tumor Viruses, Second Edition, Cold Spring Harbor Laboratory, 1985. 
Portions of the retroviral gene therapy vector may be derived from different retroviruses. For example, retrovector LTRs may be 
derived from a Murine Sarcoma Virus, a tRNA binding site from a Rous Sarcoma Vims, a packaging signal from a Murine 
Leukemia Virus, and an origin of second strand synthesis from an Avian Leukosis Virus. 

45 These recombinant retroviral vectors may be used to generate transduction competent retroviral vector particles by introducing 
them into appropriate packaging cell lines (see US patent 5,591,624). Retrovirus vectors can be constructed for site-specific 
integration into host cell DNA by incorporation of a chimeric integrase enzyme into the retroviral particle (see W096/37626). It is 
preferable that the recombinant viral vector is a replication defective recombinant virus. 
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Packaging cell lines suitable for use with the above-described retrovirus vectors are well known in the art, are readily prepared 
(see WO95/30763 and WO92/05266), and can be used to create producer cell lines (also termed vector cell lines or "VCLs") 
for the production of recombinant vector particles. Preferably, the packaging cell lines are made from human parent cells <eg. 
HT1080 cells) or mink parent cell lines, which eliminates motivation in human serum 
5 Preferred retroviruses for the construction of retroviral gene therapy vectors include Avian Leukosis Virus, Bovine Leukemia, 
Virus, Murine Leukemia Virus, Mink-Cell Focus-Inducing Vims, Murine Sarcoma Virus, Reticuloendotheliosis Virus and Rous 
Sarcoma Virus. Particularly preferred Murine Leukemia Viruses include 4070A and 1504A (Hartley and Rowe (1976) J Virol 
19: 19-25), Abelson (ATCC No. VR-999), Friend (ATCC No. VR-245), Graffi, Gross (ATCC Nol VR-590), Kirsten, Harvey 
Sarcoma Vims and Rauscher (ATCC No. VR-998) and Moloney Murine Leukemia Virus (ATCC No. VR-190). Such 
10 retroviruses may be obtained from depositories or collections such as the American Type Culture Collection ("ATCC") in 
Rockville, Maryland or isolated from known sources using commonly available techniques. 

Exemplary known retroviral gene therapy vectors employable in this invention include those described in patent applications 
GB2200651, EP0415731, EP0345242, EP0334301, WO89/02468; WO89/05349, WO89/09271, WO90/02806, 
WO90/07936, WO94/03622, W093/25698, W093/25234, WO93/11230, WO93/10218, WO91/02805, WO91/02825, 
15 WO95/07994, US 5,219,740, US 4,405,712, US 4,861,719, US 4,980,289, US 4,777,127, US 5,591,624. See also Vile 
(1993) Cancer Res 53:3860-3864; Vile (1993) Cancer Res 53:962-967; Ram (1993) Cancer Res 53 (1993) 83-88; 
Takamiya (1992) JNeurosci Res 33:493-503; Baba (1993) JNeurosurg 79:729-735; Mann (1983) Cell 33: 153; Cane (1984) 
Proc Natl Acad Sci 81:6349; and Miller (1990) Human Gene Therapy 1. 

Human adenoviral gene therapy vectors are also known in the art and employable in this invention See, for example, Berkner 

20 (1988) Biotechniques 6:616 and Rosenfeld (1991) Science 252:431, and WO93/07283, WO93/06223, and WO93/07282. 
Exemplary known adenoviral gene therapy vectors employable in this invention include those described in the above referenced 
documents and in W094/12649, WO93/03769, W093/19191, W094/28938, W095/11984, WO95/00655, WO95/27071, 
W095/29993, W095/34671, WO96/05320, WO94/08026, WO94/11506, WO93/06223, W094/24299, WO95/14102, 
W095/24297, WO95/02697, W094/28152, W094/24299, WO95/09241, WO95/25807, WO95/05835, W094/18922 and 

25 WO95/09654. Alternatively, administration of DNA linked to killed adenovirus as described in Curiel (1992) Hum. Gene Ther. 
3:147-154 may be employed. The gene deliver)' vehicles of the invention also include adenovirus associated virus (AAV) vectors. 
Leading and preferred examples of such vectors for use in this invention are the AAV-2 based vectors disclosed in Srivastava, 
WO93/09239. Most preferred AAV vectors comprise the two AAV inverted terminal repeats in which the native Dsequences 
are modified by substitution of nucleotides, such that at least 5 native nucleotides and up to 18 native nucleotides, preferably at 

30 least 10 native nucleotides up to 18 native nucleotides, most preferably 10 native nucleotides are retained and the rernaining 
nucleotides of the D-sequence are deleted or replaced with non-native nucleotides. Hie native Dsequences of the AAV inverted 
terminal repeats are sequences of 20 consecutive nucleotides in each AAV inverted terminal repeat (fe. there is one sequence at 
each end) which are not involved in HP formation. The nonnative replacement nucleotide may be any nucleotide other than the 
nucleotide found in the native Dsequence in the same position Other employable exemplary AAV vectors are pWP-19, 

35 pWN-1, both of which are disclosed in Nahreini (1993) Gene 124:257-262. Another example of such an AAV vector is 
psub201 (see Samulski (1987) J. Virol. 61:3096). Another exemplary AAV vector is the Double-D 1TR vector. Construction of 
the Double-D ITR vector is disclosed in US Patent 5,478,745. Still other vectors are those disclosed in Carter US Patent 
4,797,368 and Muzyczka US Patent 5,139,941, Chartejee US Patent 5,474,935, and Kotin W094/288157. Yet a further 
example of an AAV vector employable in this invention is SSV9AFABTKneo, which contains the AFP enhancer and albumin 

40 promoter and directs expression predominantly in the liver. Its structure and construction are disclosed in Su (1996) Human 
Gene Therapy 7:463470. Additional AAV gene therapy vectors are described in US 5,354,678, US 5,173,414, US 
5,139,941, and US 5,252,479. 

The gene therapy vectors of the invention also include herpes vectors. Leading and preferred examples are herpes simplex virus 
vectors containing a sequence encoding a thymidine kinase polypeptide such as those disclosed in US 5,288,641 and 
45 EP0176170 (Roizman). Additional exemplary herpes simplex virus vectors include HFEMICP6-LacZ disclosed in 
WO95/04139 (Wistar Institute), pHSVlac described in Geller (1988) Science 241:1667-1669 and in WO90/09441 and 
WO92/07945, HSV Us3::pgC-lacZ described in Fink (1992) Human Gene Therapy 3:11-19 and HSV 7134, 2 RH 105 and 
GAL4 described in EP 0453242 (Breakefield), and those deposited with the ATCC with accession numbers VR-977 and 
VR-260. 

50 Also contemplated are alpha virus gene therapy vectors that can be employed in this invention Preferred alpha vims vectors are 
Sindbis viruses vectors. Togaviruses, SemlM Forest virus (ATCC VR-67; ATCC VR-1247), Middleberg vims (ATCC 
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VR-370), Ross River virus (ATCC VR-373; ATCC VR-1246), Venezuelan equine encephalitis virus (ATCC VR923; ATCC 
VR-1250; ATCC VR-1249; ATCC VR-532), and those described in US patents 5,091,309, 5,217,879, and WO92/10578. 
More particularly, those alpha virus vectors described in US Serial No. 08/405,627, filed March 15, 1995.W094/21792, 
WO92/10578, WO95/07994, US 5,091,309 and US 5,217,879 are employable. Such alpha viruses may be obtained from 
5 depositories or collections such as the ATCC in Rockville, Maryland or isolated from known sources using commonly available 
techniques. Preferably, alphavirus vectors with reduced cytotoxicity are used (see USSN 08/679640). 
DNA vector systems such as eukaryotic layered expression systems are also useful for expressing the nucleic acids of the 
invention. See WO95/07994 for a detailed description of eukaryotic layered expression systems. Preferably, the eukaryotic 
layered expression systems of the invention are derived from alphavirus vectors and most preferably from Sindbis viral vectors. 

10 Other viral vectors suitable for use in the present invention include those derived from poliovirus, for example ATCC VR-58 and 
those described in Evans, Nature 339 (1989) 385 and Sabin (1973) J. Biol. Standardization 1:115; rhinovirus, for example 
ATCC VR-1110 and those described in Arnold (1990) J Cell Biochem L401; pox viruses such as canary pox virus or vaccinia 
virus, for example ATCC VR-111 and ATCC VR-2010 and those described in Fisher-Hoch (1989) Proc Natl Acad Sci 
86:317; Flexner (1989) Ann NY Acad Sci 569:86, Flexner (1990) Vaccine 8:17; in US 4,603,112 and US 4,769,330 and 

15 WO89/01973; SV40 vims, for example ATCC VR-305 and those described in Mulligan (1979) Nature 277:108 and Madzak 
(1992) J Gen Virol 73:1533; influenza virus, for example ATCC VR-797 and recombinant influenza viruses made employing 
reverse genetics techniques as described in US 5,166,057 and in Enami (1990) Proc Natl Acad Sci 87:3802-3805; Enami & 
Palese (1991) / Virol 65:271 1-2713 and Luytjes (1989) Cell 59:110, (see also McMichael (1983) NEJMed 309:13, and Yap 
(1978) Nature 273:238 and Nature (1979) 277:108); human inmiunodeficiency vims as described in EP-0386882 and in 

20 Buchschacher (1992) J. Virol. 66:2731; measles vims, for example ATCC VR-67 and VR-1247 and those described in EP- 
0440219; Aura virus, for example ATCC VR-368; Bebaru virus, for example ATCC VR-600 and ATCC VR-1240; Cabassou 
virus, for example ATCC VR-922; Chikungunya virus, for example ATCC VR-64 and ATCC VR-1241; Fort Morgan Virus, 
for example ATCC VR-924; Getah vims, for example ATCC VR-369 and ATCC VR-1243; Kyzylagach vims, for example 
ATCC VR-927; Mayaro vims, for example ATCC VR-66; Mucambo virus, for example ATCC VR-580 and ATCC VR-1244; 

25 Ndumu virus, for example ATCC VR-371; Pixuna vims, for example ATCC VR-372 and ATCC VR-1245; Tonate virus, for 
example ATCC VR-925; Triniti vims, for example ATCC VR-469; Una vims, for example ATCC VR-374; Whataroa vims, for 
example ATCC VR-926; Y-62-33 vims, for example ATCC VR-375; ONyong virus, Eastern encephalitis virus, for example 
ATCC VR-65 and ATCC VR-1242; Western encephalitis vims, for example ATCC VR-70, ATCC VR-1251, ATCC VR-622 
and ATCC VR-1252; and coronavirus, for example ATCC VR-740 and those described in Harare (1966) Proc Soc Exp Biol 

30 Med 121:190. 

Delivery of the compositions of this invention into cells is not limited to the above mentioned viral vectors. Other delivery methods 
and media may be employed such as, for example, nucleic acid expression vectors, polycationic condensed DNA linked or 
unlinked to killed adenovirus alone, for example see US Serial No. 08/366,787, filed December 30, 1994 and Curiel (1992) 
Hum Gene Ther 3:147-154 ligand linked DNA for example see Wu (1989) J Biol Chem 264:16985-16987, eucaryotic cell 
35 delivery vehicles cells, for example see US Serial No.08/240,030, filed May 9, 1994, and US Serial No. 08/404,796, deposition 
of photopolymerized hydrogel materials, hand-held gene transfer particle gun, as described in US Patent 5,149,655, ionizing 
radiation as described in US5,206,152 and in WO92/11033, nucleic charge neuhalization or fusion with cell membranes. 
Additional approaches are described in Philip (1994) Mol Cell Biol 14:2411-2418 and in Woffendin (1994) Proc Natl Acad 
Sci 91:1581-1585. 

40 Particle mediated gene transfer may be employed, for example see US Serial No. 60/023,867. Briefly, the sequence can be 
inserted into conventional vectors that contain conventional control sequences for high level expression, and then incubated with 
synthetic gene transfer molecules such as polymeric DNA-binding cations like polylysine, protamine, and albumin, linked to cefl 
targeting ligands such as aaaloorosomucoid, as described in Wu & Wu (1987) J. Biol. Chem. 262:4429-4432, insulin as 
described in Hucked (1990) Biochem Pharmacol 40:253-263, galactose as described in Plank (1992) Bioconjugate Chem 

45 3:533-539, lactose or transferrin. 

Naked DNA may also be employed. Exemplary naked DNA introduction methods are described in WO 90/11092 and US 
5,580,859. Uptake efficiency may be improved using biodegradable latex beads. DNA coated latex beads are efficiently 
transported into cells after endocytosis initiation by the beads. The method may be improved further by treatment of the beads to 
increase hydrophobicity and thereby facilitate disruption of the endosome and release of the DNA into the cytoplasm 

50 Liposomes that can act as gene delivery vehicles are described in US 5,422,120, W095/13796, W094/23697, W091/14445 
and EP-524,968. As described in USSN. 60/023,867, on non-viral delivery, the nucleic acid sequences encoding a polypeptide 
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can be inserted into conventional vectors that contain conventional control sequences for high level expression, and then be 
incubated with synthetic gene transfer molecules such as polymeric DNAbinding cations like polylysine, protamine, and albumin, 
linked to cell targeting ligands such as asialoorosomucoid, insulin, galactose, lactose, or transferrin. Other delivery systems include 
the use of liposomes to encapsulate DNA comprising the gene under the control of a variety of tissue-specific or 
5 ubiquitously-active promoters. Further non-viral delivery suitable for use includes mxhanical delivery systems such as the 
approach described in Woffendin et al (1994) Proc. Natl. Acad. Sci. USA 91(24): 11581-1 1585. Moreover, the coding 
sequence and the product of expression of such can be delivered through deposition of photopolymerized hydrogel materials. 
Other conventional methods for gene delivery that can be used for delivery of the coding sequence include, for example, use of 
hand-held gene transfer particle gun, as described in US 5,149,655; use of ionizing radiation for activating ransferred gene, as 

10 describedinUS 5,206,152 andWO92/11033 

Exemplary liposome and polycationic gene delivery vehicles are those described in US 5,422,120 and 4,762,915; in WO 
95/13796; W094/23697; and W091/14445; in EP-0524968; and in Stryer, Biochemistry, pages 236-240 (1975) W.H. 
Freeman, San Francisco; Szoka (1980) Biochem Biophys Acta 600:1; Bayer (1979) Biochem Biophys Acta 550:464; Rivnay 
{mi)MethEnzymo1 149:119; Wang (mi) Proc Natl Acad Sci 84:7851; Plant (1989) Anal Biochem 176:420. 

15 A polynucleotide composition can comprises therapeutically effective amount of a gene therapy vehicle, as the term is defined 
above. For purposes of the present invention, an effective dose will be from about 0.01 mg/ kg to 50 mg/kg or 0.05 mg/kg to 
about 10 mg/kg of the DNA constructs in the individual to which it is administered. 
Delivery Methods 

Once formulated, the polynucleotide compositions of the invention can be aatainistered (1) directly to the subject; (2) delivered ex 
20 vivo, to cells derived from the subject; or (3) in vitro for expression of recombinant proteins. The subjects to be treated can be 
mammals or birds. Also, human subjects can be treated. 

Direct delivery of the compositions will generally be accomplished by injection, either subcutaneously, intaperitoneally, 
intravenously or intramuscularly or delivered to the interstitial space of a tissue. The compositions can also be administered into a 
lesion. Other modes of administration include oral and pulmonary administration, suppositories, and transdermal or transcutaneous 
25 applications (eg. see WO98/20734), needles, and gene guns or hyposprays. Dosage treatment may be a single dose schedule or 
a multiple dose schedule. 

Methods for the ex vivo delivery and reimplantation of transformed cells into a subject are known in the art and described in eg. 
W093/14778. Examples of cells useful in ex vivo applications include, for example, stem cells, particularly hematopoetic, lymph 
cells, macrophages, dendritic cells, or tumor cells. 
30 Generally, delivery of nucleic acids for both ex vivo and in vitro applications can be accomplished by the following procedures, 
for example, dextran-mediated transfection, calcium phosphate precipitation, polybrene mediated transfection, protoplast fusion, 
electroporation, encapsulation of the polynucleotide(s) in liposomes, and direct microinjection of the DNA into nuclei, all well 
known in the art. 

Polynucleotide and polypeptide pharmaceutical compositions 
35 hi addition to the pharmaceutically acceptable carriers and salts described above, the following additional agents can be used with 
polynucleotide and/or polypeptide compositions. 
APolypeptides 

One example are polypeptides which include, without limitation: asioloorosomucoid (ASOR); transferrin; asialoglycoproteins; 
antibodies; antibody fragments; ferritin; kterleukins; interferons, granulocyte, macrophage colony stimulating factor (GMCSF), 
40 granulocyte colony stimulating factor (G-CSF), macrophage colony stimulating factor (M-CSF), stem cell factor and 
erythropoietin. Viral antigens, such as envelope proteins, can also be used. Also, proteins from other invasive organisms, such as 
the 17 amino acid peptide from the circumsporozoite protein of Plasmodium falciparum known as RH 
BPformones. Vitamins, etc. 

Other groups that can be included are, for example: hormones, steroids, androgens, estrogens, thyroid hormone, or vitamins, folic 
45 acid. 



WO 02/34771 PCT/GB01/04789 



CPolyalkylenes, Polysaccharides, etc. 

Also, polyalkylene glycol can be included with the desired polynucleotides/polypeptides. In a preferred embodiment, the 
polyalkylene glycol is polyethlylene glycol. In addition, mono-, di-, or polysaccharides can be included, hi a preferred 
embodiment of this aspect, the polysaccharide is dextran or DEAE-dexfran. Also, chitosan and poly(lactide-co-glycolide) 
5 DLipids. and Liposomes 

Hie desired polynucleotide/polypeptide can also be encapsulated in lipids or packaged in liposomes prior to delivery to the 
subject or to cells derived therefrom 

Lipid encapsulation is generally accomplished using liposomes which are able to stably bind or entrap and retain nucleic acid The 
ratio of condensed polynucleotide to lipid preparation can vary but will generally be around 1:1 (mg DNA:micromoles lipid), or 
10 more of lipid. For a review of the use of liposomes as carriers for delivery of nucleic acids, see, Hug and Sleight (1991) Biochim. 
Biophys.Acta. 1097:1-17; Straubinger (1983) Meth. Enzymol. 101:512-527. 

Liposomal preparations for use in the present invention include cationic (positively charged), anionic (negatively charged) and 
neutral preparations. Cationic liposomes have been shown to mediate intracellular delivery of plasmid DNA (Feigner (1987) 
Proc. Natl. Acad. Sci. USA 84:7413-7416); mRNA (Malone (1989) Proc. Natl. Acad. Sci. USA 86:6077-6081); and purified 

15 transcription factors (Debs (1990) i Biol. Chem. 265:10189-10192), in functional form. 

Cationic liposomes are readily available. For example. N[l-2,3-dioleyloxy)propyl]-N,N,N-trie1hylammonium (DOTMA) 
liposomes are available under the trademark Lipofectin, from GBCO BRL, Grand Island, NY. (See, also, Feigner supra). Other 
commercially available liposomes include transfectace (DDAB/DOPE) and DOTAP/DOPE (Boerhinger). Other cationic 
liposomes can be prepared from readily available materials using techniques well known in the art. See, eg. Szoka (1978) Proc. 

20 Natl. Acad. Sci. USA 75:4194-4198; WO90/11092 for a description of the synthesis of DOTAP 
(l,2-bis(oleoyloxy)-3-(1rmierhylammonio)propanc)hposomes. 

Similarly, anionic and neutral liposomes arc readily available, such as from Avanti Polar Lipids (Birmingham, AL), or can be easily 
prepared using readily available materials. Such materials include phosphatidyl choline, cholesterol, phosphatidyl ethanolamine, 
dioleoylphosphatidy] choline (DOPC), dioleoylphosphatidyl glycerol (DOPG), dioleoylphoshatidyl ethanolamine (DOPE), among 
25 others. These materials can also be mixed with the DOTMA and DOTAP starting materials in appropriate ratios. Methods for 
making liposomes using these materials are well known in the art. 

The liposomes can comprise multilammelar vesicles (MLVs), small unilamellar vesicles (SUVs), or large unilamellar vesicles 
(LUVs). The various liposome-nucleic acid complexes are prepared using methods known in the art. See eg. Straubinger (1983) 
Meth. Immunol. 101:512-527; Szoka (1978) Proc. Natl. Acad. Sci. USA 75:4194-4198; Papahadjopoulos (1975) Biochim. 
30 Biophys. Acta 394:483; Wilson (1979) Cell 17:77); Deamer & Bangham (1976) Biochim. Biophys. Acta 443:629; Ostro 

(1977) Biochem. Biophys. Res. Commun. 76:836; Fraley (1979) Proc. Natl. Acad. Sci. USA 76:3348); Enoch & Stritrmatter 
(1979) Proc. Natl. Acad. Sci. USA 76:145; Fraley (1980) J. Biol. Chem. (1980) 255:10431; Szoka & Papahadjopoulos 

(1978) Proc. Natl. Acad. Sci. USA 75:145; and Schaefer-Ridder (1982) Science 215:166. 
Eiipoproteins 

35 In addition, lipoproteins can be included with the polynucleotide/polypeptide to be delivered Examples of lipoproteins to be 
utilized include: chylomicrons, HDL, IDL, LDL, and VLDL. Mutants, fragments, or fusions of these proteins can also be used 
Also, modifications of naturally occurring lipoproteins can be used, such as acerylated LDL. These lipoproteins can target the 
delivery of polynucleotides to cells expressing lipoprotein receptors. Preferably, if lipoproteins are including with the 
polynucleotide to be delivered no other targeting ligand is included in the composition 

40 Naturally occurring lipoproteins comprise a lipid and a protein portion The protein portion are known as apoproteins. At the 
present, apoproteins A, B, C, D, and E have been isolated and identified. At least two of these contain several proteins, 
designated by Roman numerals, AI, AH, ATV; CI, CI, CHI. 

A lipoprotein can comprise more than one apoprotein For example, naturally occurring chylomicrons comprises of A, B, C & E, 
over time these lipoproteins lose A and acquire C & E. VLDL comprises A, B, C & E apoproteins, LDL comprises apoprotein 
45 B; and HDL comprises apoproteins A, C, & E. 

The amino acid of these apoproteins are known and are described in, for example, Breslow (1985) Annu Rev. Biochem 54:699; 
Law (1986) Adv. Exp Med. Biol. 151:162; Chen (1986) J Biol Chem 261:12918; Kane (1980) Proc Natl Acad Sci USA 
77:2465; and Utermann (1984) Hum Genet 65:232. 
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Lipoproteins contain a variety of lipids including, triglycerides, cholesterol (free and esters), and phospholipids. The composition 
of the lipids varies in naturally occurring lipoproteins. For example, chylomicrons comprise mainly triglycerides. A more detailed 
description of the lipid content of naturally occurring lipoproteins can be found, for example, in Meth. Enzymol. 128 (1986). The 
composition of the lipids are chosen to aid in conformation of the apoprotein for receptor binding activity. The composition of 

5 lipids can also be chosen to facilitate hydrophobic interaction and association with the polynucleotide binding molecule. 

Naturally occurring lipoproteins can be isolated from serum by dtracentritugation, for instance. Such methods are described in 
Meth. Enzymol. (supra); Pitas (1980) J. Biochem. 255:5454-5460 and Mahey (1979) J Clin. Invest 64:743-750. 
Lipoproteins can also be produced by in vitro or recombinant methods by expression of the apoprotein genes in a desired host 
cell. See, for example, Atkinson (1986) Amu RevBiophys Chan 15:403 and Radding (\95S)Biochim Biophys Acta 30: 443. 

10 Lipoproteins can also be purchased from commercial suppliers, such as Biomedical Techniologies, Inc., Stoughton, MA, USA. 
Further description of lipoproteins can be found in WO98/06437.. 
FMycationic Agents 

Polycationic agents can be included, with or without lipoprotein, in a composition with the desired polynucleotide/polypeptide to 
be delivered. 

15 Polycationic agents, typically, exhibit a net positive charge at physiological relevant pH and are capable of neutralizing the 
electrical charge of nucleic acids to facilitate delivery to a desired location. These agents have both in vitro, ex vivo, and in vivo 
applications. Polycationic agents can be used to dehver nucleic acids to a living subject either intramuscularly, subcutaneously, etc. 
The following are examples of useful polypeptides as polycationic agents: polylysine, polyarginine, polyormthine, and protamine. 
Other examples include histones, protamines, human serum albumin, DNA binding proteins, nonhistone chromosomal proteins, 

20 coat proteins from DNA viruses, such as (X174, transcriptional factors also contain domains that bind DNA and therefore may 
be useful as nucleic aid condensing agents. Briefly, transcriptional factors such as C/CEBP, c-jun, o-fos, AP-1, AP-2, AP-3, 
CPF, Prot-1, Sp-1, Oct-1, Oct-2, CREP, and TFHD contain basic domains that bind DNA sequences. 
Organic polycationic agents include: spermine, spermidine, and purtrescine. 

The dimensions and of the physical properties of a polycationic agent can be extrapolated from the list above, to construct other 
25 polypeptide polycationic agents or to produce synthetic polycationic agents. 

Synthetic polycationic agents which are useful include, for example, DEAE-dextran, polybrene. Lipofectin™, and 
hpofectAMME™ are monomers that form polycationic complexes when combined with polynucleotides/polypeptides. 
Immunodiaznostic Assays 

Streptococcus antigens of the invention can be used in immunoassays to detect antibody levels (or, conversely, anti-streptococcus 
30 antibodies can be used to detect antigen levels). Immunoassays based on well defined, recombinant antigens can be developed to 
replace invasive diagnostics methods. Antibodies to streptococcus proteins within biological samples, including for example, blood 
or serum samples, can be detected. Design of the immunoassays is subject to a great deal of variation, and a variety of these are 
known in the art. Protocols for the immunoassay may be based, for example, upon competition, or direct reaction, or sandwich 
type assays. Protocols may also, for example, use solid supports, or may be by immuiioprecipitation. Most assays involve the use 
35 of labeled antibody or polypeptide; the labels may be, for example, fluorescent, chemflumrnescent, radioactive, or dye molecules. 
Assays which amplify the signals from the probe are also known; examples of which are assays which utilize biotin and avidin, and 
enzyme-labeled and mediated immunoassays, such as ELISA assays. 

Kits suitable for irnmunodiagnosis and containing the appropriate labeled reagents are constructed by packaging the appropriate 
materials, including the compositions of the invention, in suitable containers, along with the rerraining reagents and materials (for 
40 example, suitable buffers, salt solutions, etc) required for the conduct of the assay, as well as suitable set of assay instmctions. 
Nucleic Acid Hybridisation 

'Hybridization" refers to the association of two nucleic acid sequences to one another by hydrogen bonding. Typically, one 
sequence will be fixed to a solid support and the other will be free in solution. Then, the two sequences will be placed in contact 
with one another under conditions that favor hydrogen bonding. Factors that affect this bonding include: the type and volume of 
45 solvent; reaction temperature; time of hybridization; agitation; agents to block the nonspecific attachment of the liquid phase 
sequence to the solid support (Denhardt's reagent or BLOTTO); concentration of the sequences; use of compounds to increase 
the rate of association of sequences (dextran sulfate or polyethylene glycol); and the stringency of the washing conditions following 
hybridization See Sambrookef al. [supra] Volume 2, chapter 9, pages 9.47 to 9.57. 
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"Stringency" refers to conditions in a hybridization reaction that favor association of very similar sequences over sequences that 
differ. For example, the combination of temperature and salt concentration should be chosen that is approximately 120 to 20CPC 
below the calculated Tm of the hybrid under study. The temperature and salt conditions can often be determined empirically in 
preliminary experiments in which samples of genomic DNA immobilized on filters are hybridized to the sequence of interest and 
5 then washed under conditions of different stringencies. See Sambrookera/. at page 9.50. 

Variables to consider when performing, for example, a Southern blot are (1) the complexity of the DNA being blotted and (2) the 
homology between the probe and the sequences being detected The total amount of the fragments) to be studied can vary a 
magnitude of 10, from 0.1 to lug for a plasmid or phage digest to 10" 9 to 10" 8 g for a single copy gene in a highly complex 
eukaryotic genome. For lower complexity polynucleotides, substantially shorter blotting, hybridization, and exposure times, a 

10 smaller amount of starting polynucleotides, and lower specific activity of probes can be used For example, a single-copy yeast 
gene can be detected with an exposure time of only 1 hour starting with 1 ug of yeast DNA, blotting for two hours, and 
hybridizing for 4-8 hours wife a probe of 10 8 cpm/ug. For a single-copy mammalian gene a conservative approach would start 
wife 10 ug of DNA blot overnight, and hybridize overnight in fee presence of 10% dextran sulfate using a probe of greater than 
10 8 cpm/ug, resulting in an exposure time of ~24 hours. 

15 Several factors can affect fee melting temperature (Tm) of a DNA-DNA hybrid between the probe and fee fragment of interest, 
and consequently, the appropriate conditions for hybridization and washing. In many cases fee probe is not 100% homologous to 
fee fiagment. Other commonly encountered variables include fee length and total G+C content of the hybridizing sequences and 
fee ionic strength and formamide content of the hybridization buffer. The effects of all of these factors can be approximated by a 
single equation: 

20 Tm= 81 + 16.6(log 10 Ci) + 0.4[%(G + C)]-0.6(%formamide) - 600/»-1.5(%mismatch). 

where Ci is the salt concentration (monovalent ions) and n is the length of fee hybrid in base pairs (slightly modified from 
Meinkofe & Wahl (1984) Anal. Biochem. 138: 267-284). 

In designing a hybridization experiment, some factors affecting nucleic acid hybridization can be conveniently altered The 
temperature of the hybridization and washes and fee salt concentration during fee washes are the simplest to adjust. As the 

25 temperature of fee hybridization increases (ie. stringency), it becomes less likely for hybridization to occur between strands feat 
are nonhomologous, and as a result, background decreases. If the radiolabeled probe is not completely homologous wife fee 
immobilized fragment (as is frequently fee case in gene family and interspecies hybridization experiments), fee hybridization 
temperature must be reduced and background will increase. The temperature of fee washes affects the intensity of the hybridizing 
band and fee degree of background in a similar manner. Hie stringency of fee washes is also increased wife decreasing salt 

30 concentrations. 

hi general, convenient hybridization temperatures in the presence of 50% fonnamide are 42PC for a probe with is 95% to 100% 
homologous to fee target fragment, 37°C for 90% to 95% homology, and 32°C for 85% to 90% homology. For lower 
homologies, formamide content should be lowered and temperature adjusted accordingly, using fee equation above. If the 
homology between fee probe and fee target fragment are not known, the simplest approach is to start with both hybridization and 
35 wash conditions which are nonstringent. If non-specific bands or high background are observed after autoradiography, the filter 
can be washed at high stringency and reexposed. If fee time required for exposure makes this approach impractical, several 
hybridization and/or washing stringencies should be tested in parallel. 
Nucleic Acid Probe Assays 

Methods such as PGR, branched DNA probe assays, or blotting techniques utilizing nucleic acid probes according to fee 
40 invention can determine the presence of cDNA or mRNA A probe is said to "hybridize" wife a sequence of fee invention if it can 
form a duplex or double stranded complex, which is stable enough to be detected. 

The nucleic acid probes will hybridize to the streptococcus nucleotide sequences of fee invention (including both sense and 
antisense strands). Though many different nucleotide sequences will encode the amino acid sequence, fee native streptococcus 
sequence is preferred because it is fee actual sequence present in cells. mRNA represents a coding sequence and so a probe 
45 should be complementary to the coding sequence; single-stranded cDNA is complementary to mRNA, and so a cDNA probe 
should be complementary to fee non-coding sequence. 

The probe sequence need not be identical to the streptococcus sequence (or its complement) — some variation in the sequence 
and length can lead to increased assay sensitivity if fee nucleic acid probe can form a duplex wife target nucleotides, which can be 
detected. Also, fee nucleic acid probe can include additional nucleotides to stabilize fee formed duplex. Additional streptococcus 
50 sequence may also be helpful as a label to detect fee formed duplex. For example, a non-complementary nucleotide sequence 
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may be attached to the 5' end of the probe, with the remainder of the probe sequence being complementary to a streptococcus 
sequence. Alternatively, non-complementary bases or longer sequences can be interspersed into the probe, provided that the 
probe sequence has sufficient complementarity with the a streptococcus sequence in order to hybridize therewith and thereby 
form a duplex which can be detected. 
5 The exact length and sequence of the probe will depend on the hybridization conditions (e.g. temperature, salt condition etc.). For 
example, for diagnostic applications, depending on the complexity of the analyte sequence, the nucleic acid probe typically 
contains at least 10-20 nucleotides, preferably 15-25, and more preferably at least 30 nucleotides, although it may be shorter than 
this. Short primers generally require cooler temperatures to form sufficiently stable hybrid complexes with the template. 
Probes may be produced by synthetic procedures, such as the triester method of Matteucci et al, [J. Am. Chem. Soc. (1981) 
10 103:3185], or according to Urdea et at. [Proc. Natl. Acad, Sci. USA (1983) 80: 7461], or using commercially available 
automated oligonucleotide synthesizers. 

The chemical nature of the probe can he selected according to preference. For certain applications, DNA or RNA are 
appropriate. For other applications, modifications may be incorporated eg. backbone modifications, such as phosphorothioates 
or methylphosphonates, can be used to increase in vivo half-life, alter RNA affinity, increase nuclease resistance etc. [eg. see 

15 Agrawal & Iyer (1995) Curr Opin Biotechnol 6:12-19; Agrawal (1996) TIBTECH 14:376-387]; analogues such as peptide 
nucleic acids may also be used [eg. see Corey (1997) TIBTECH 15:224-229; Buchardt et al. (1993) TIBTECH 1 1:384-386], 
Alternatively, the polymerase chain reaction (PCR) is another well-known means for detecting small amounts of target nucleic 
acid. The assay is described in Mullis et al. [Meth. Enzymol. (1987) 155:335-350] & US patents 4,683,195 & 4,683,202. Two 
"primer" nucleotides hybridize with the target nucleic acids and are used to prime the reaction The primers can comprise 

20 sequence that does not hybridize to the sequence of the amplification target (or its complement) to aid with duplex stability or, for 
example, to incorporate a convenient restriction site. Typically, such sequence will flank the desired streptococcus sequence. 
A thermostable polymerase creates copies of target nucleic acids from the primers using the original target nucleic acids as a 
template. After a threshold amount of target nucleic acids are generated by the polymerase, they can be detected by more 
traditional methods, such as Southern blots. When using the Southern blot method, the labelled probe will hybridize to the 

25 streptococcus sequence (or its complement). 

Also, mRNA or cDNA can be detected by traditional blotting techniques described in Sambrook et al [supra]. mRNA, or 
cDNA generated from mRNA using a polymerase enzyme, can be purified and separated using gel electrophoresis. The nucleic 
acids on the gel are then blotted onto a solid support, such as nitrocellulose. The solid support is exposed to a labelled probe and 
then washed to remove any unhybridized probe. Next, the duplexes containing the labeled probe are detected. Typically, the 

30 probe is labelled with a radioactive moiety. 

BRIEF DESCRIPTION OF DRAWINGS 

Figures 1 to 85, 119 to 188, 238 and 239 show SDS-PAGE analysis of total cell extracts from 
cultures of recombinant E.coli expressing GBS proteins of the invention. Lane 1 in each gel (except for 
Figure 185) contains molecular weight markers. These are 94, 67, 43, 30, 20.1 & 14.4 kDa (except for 
35 Figures 7, 8, 10, 11, 13, 14, 15 and 119-170, which use 250, 150, 100, 75, 50, 37, 25, 15 & 10 kDa). 

Figure 86A shows the pDEST15 vector and Figure 86B shows the pDEST17-l vector. 

Figures 88 to 118 and 247 to 319 show protein characterisation data for various proteins of the 
invention. 

Figures 189 to 237 and 240 to 246 show SDS-PAGE analysis of purified GBS proteins of the 
40 invention. The left-hand lane contains molecular weight markers. These are 94, 67, 43, 30, 20.1 & 14.4 
kDa. 
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MODES FOR CARRYING OUT THE INVENTION 

The following examples describe nucleic acid sequences which have been identified in Streptococcus, 
along with their inferred translation products. The examples are generally in the following format: 

• a nucleotide sequence which has been identified in Streptococcus 
5 • the inferred translation product of this sequence 

• a computer analysis (e.g. PSORT output) of the translation product, indicating antigenicity 

Most examples describe nucleotide sequences from S.agalactiae. The specific strain which was 
sequenced was from serotype V, and is a clinical strain isolated in Italy which expresses the R antigen 
(ISS/Rome/Italy collection, strain.2603 V/R). For several of these examples, the corresponding 
10 sequences from S.pyogenes are also given. Where GBS and GAS show homology in this way, there is 
conservation between species which suggests an essential function and also gives good cross-species 
reactivity. 

In contrast, several examples describe nucleotide sequences from GAS for which no homolog in GBS 
has been identified. This lack of homology gives molecules which are useful for distinguishing GAS 
15 from GBS and for making GAS-specific products. The same is true for GBS sequences which lack 
GAS homologs e.g. these are useful for making GBS-specific products. 

The examples typically include details of homology to sequences in the public databases. Proteins that 
are similar in sequence are generally similar in both structure and function, and the homology often 
indicates a common evolutionary origin. Comparison with sequences of proteins of known function is 
20 widely used as a guide for the assignment of putative protein function to a new sequence and has proved 
particularly useful in whole-genome analyses. 

Various tests can be used to assess the in vivo immunogenicity of the proteins identified in the examples. 
For example, the proteins can be expressed recombinantly and used to screen patient sera by 
immunoblot. A positive reaction between the protein and patient serum indicates that the patient has 
25 previously mounted an immune response to the protein in question i.e. the protein is an immunogen. This 
method can also be used to identify immunodominant proteins. The mouse model used in the examples 
can also be used. 

The recombinant protein can also be conveniently used to prepare antibodies e.g. in a mouse. These can 
be used for direct confirmation that a protein is located on the cell-surface. Labelled antibody (e.g. 
30 fluorescent labelling for FACS) can be incubated with intact bacteria and the presence of label on the 
bacterial surface confirms the location of the protein. 

For many GBS proteins, the following data are given: 

- SDS-PAGE analysis of total recombinant E.coli cell extracts for GBS protein expression 

- SDS-PAGE analysis after the protein purification 
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- Western-blot analysis of GBS total cell extract using antisera raised against recombinant proteins 

- FACS and ELISA analysis against GBS using antisera raise against recombinant proteins 

- Results of the in vivo passive protection assay 

Details of experimental techniques used are presented below: 
5 Sequence analysis 

Open reading frames (ORFs) within nucleotide sequences were predicted using the GLIMMER program 
[Salzberg et al. (1998) Nucleic Acids Res 26:544-8]. Where necessary, start codons were modified and 
corrected manually on the basis of the presence of ribosome-binding sites and promoter regions on the 
upstream DNA sequence. 

10 ORFs were then screened against the non-redundant protein databases using the programs BLASTp 
[Altschul etal. (1990) J. Mol. Biol. 215:403-410] and PRAZE, a modification of the Smith-Waterman 
algorithm [Smith & Waterman (1981) J Mol Biol 147:195-7; see Fleischmann et al (1995) Science 
269:496-512]. 

Leader peptides within the ORFs were located using three different approaches: (i) PSORT [Nakai 
15 (1991) Bull. Inst. Chem. Res., Kyoto Univ. 69:269-291; Horton & Nakai (1996) Intellig. Syst. Mol. Biol. 
4:109-115; Horton & Nakai (1997) Intellig. Syst. Mol. Biol. 5:147-152]; (ii) SignalP [Nielsen & Krogh 
(1998) in Proceedings of the Sixth International Conference on Intelligent Systems for Molecular 
Biology (ISMB 6), AAAI Press, Menlo Park, California, pp. 122-130; Nielsen et al. (1999) Protein 
Engineering 12:3-9; Nielsen et al. (1997). Int. J. Neural Sys. 8:581-599]; and (iii) visual inspection of the 
20 ORF sequences. Where a signal sequences is given a "possible site" value, the value represents the 
C-terminus residue of the signal peptide e.g. a "possible site" of 26 means that the signal sequence 
consists of amino acids 1-26. 

Lipoprotein-specific signal peptides were located using three different approaches: (i) PSORT [see 
above]; (ii) the "prokaryotic membrane lipoprotein lipid attachment site" PROSITE motif [Hofmann et 
25 al. (1999) Nucleic Acids Res. 27:215-219; Bucher & Bairoch (1994) in Proceedings 2nd International 
Conference on Intelligent Systems for Molecular Biolog}> (ISMB-94), AAAI Press, pages 53-61]; and 
(iii) the FINDPATTERNS program available in the GCG Wisconsin Package, using the pattern 
. (M,L,V)x{9,35}LxxCx. 

Transmembrane domains were located using two approaches: (i) PSORT [see above]; (ii) TopPred [von 
30 Heijne (1992) J. Mol. Biol. 225:487-494]. 

LPXTG motifs, characteristic of cell-wall attached proteins in Gram-positive bacteria [Fischetti et al. 
(1990) Mol Microbiol 4:1603-5] were located with FINDPATTERNS using the pattern 
(L,I,V,M,Y,F)Px(T,A,S,G) (G,N, S,T,A,L) . 
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RGD motifs, characteristic of cell-adhesion molecules [D'Souza et al. (1991) Trends Biochem Sci 
16:246-50] were located using FINDPATTERNS. 

Enzymes belonging to the glycolytic pathway were also selected as antigens, because these have been 
found experimentally expressed on the surface of Sti-eptococci [e.g. Pancholi & Fischetti (1992) J Exp 
5 Med 176:415-26; Pancholi & Fischetti (1998) J Biol Chern 273:14503-15]. 

Cloning, expression and purification of proteins 

GBS genes were cloned to facilitate expression in E.coli as two different types of fusion proteins: 

a) proteins having a hexa-histidine tag at the amino-terminus (His-gbs) 

b) proteins having a GST fusion partner at the amino-terminus (Gst-gbs) 

10 Cloning was performed using the Gateway™ technology (Life Technologies), which is based on the site- 
specific recombination reactions that mediate integration and excision of phage lambda into and from the 
E.coli genome. A single cloning experiment included the following steps: 

1- Amplification of GBS chromosomal DNA to obtain a PCR product coding for a single ORF 
flanked by attB recombination sites. 

15 2- Insertion of the PCR product into a pDONR vector (containing atiP sites) through a BP reaction 

(attB x att? sites). This reaction gives a so called 'pEntry' vector, which now contains attL sites 
flanking the insert. 

3- Insertion of the GBS gene into E.coli expression vectors (pDestination vectors, containing attR 
sites) through a LR reaction between pEntry and pDestination plasmids (attL x attR sites). 

20 A) Chromosomal DNA preparation 

For chromosomal DNA preparation, GBS strain 2603 V/R (Istituto Superiore Sanita, Rome) was grown 
to exponential phase in 2 litres TH Broth (Difco) at 37°C, harvested by centrifugation, and dissolved in 
40 ml TES (50 mM Tris pH 8, 5 mM EDTA pH 8, 20% sucrose). After addition of 2.5 ml lysozyme 
solution (25 mg/ml in TES) and 0.5 ml mutanolysin (Sigma M-9901, 25000U/ml in H 2 0), the suspension 

25 was incubated at 37°C for 1 hour. 1 ml RNasc (20 mg/ml) and 0.1 ml proteinase K (20 mg/ml) were 
added and incubation was continued for 30 min. at 37°C. 

Cell lysis was obtained by adding 5 ml sarkosyl solution (10% N-laurylsarcosine in 250 mM EDTA pH 
8.0), and incubating 1 hour at 37°C with frequent inversion. After sequential extraction with phenol, 
phenol-chloroform and chloroform, DNA was precipitated with 0.3M sodium acetate pH 5.2 and 2 
30 volumes of absolute ethanol. The DNA pellet was rinsed with 70% ethanol and dissolved in TE buffer 
(10 mM Tris-HCl, 1 mM EDTA, pH 8). DNA concentration was evaluated by OD 260 . 
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B) Oligonucleotide design 

Synthetic oligonucleotide primers were designed on the basis of the coding sequence of each ORF. The 
aim was to express the protein's extracellular region. Accordingly, predicted signal peptides were 
omitted (by deducing the 5' end amplification primer sequence immediately downstream from the 
5 predicted leader sequence) and C-terminal cell-wall ancoring regions were removed (e.g. LPXTG motifs 
and downstream amino acids). Where additional nucleotides have been deleted, this is indicated by the 
suffix 'd' (e.g. 'GBS352d' - see Table V). Conversely, a suffix 'L' refers to expression without these 
deletions. Deletions of C- or N-terminal residues were also sometimes made, as indicated by a 'C or 'N' 
suffix. 

10 The amino acid sequences of the expressed GBS proteins (including 'd' and 'L' forms etc.) are 
definitively defined by the sequences of the oligonuclotide primers given in Table II. 

5' tails of forward primers and 3' tails of reverse primers included attBl and attBl sites respectively: 

Forward primers: 5'-GGGGACAAGTTTGTACAAAAAAGCAGGCTCT-ORF in frame-3' (the TCT 
sequence preceding the ORF was omitted when the ORF's first coding triplet began with T). 

15 Reverse primers: 5'-GGGGACCACTTTGTACAAGAAAGCTGGGTT-ORF reverse complement-3'. 

The number of nucleotides which hybridized to the sequence to be amplified depended on the melting 
temperature of the primers, which was determined as described by Breslauer et al. [PNAS USA (1986) 
83:3746-50]. The average melting temperature of the selected oligos was 50-55°C for the hybridizing 
region and 80-85°C for the whole oligos. 

20 C) Amplification 

The standard PCR protocol was as follows: 50 ng genomic DNA were used as template in the presence 
of 0.5 uM each primer, 200 uM each dNTP, 1.5 mM MgCt, lx buffer minus Mg ++ (Gibco-BRL) and 2 
units of Taq DNA polymerase (Platinum Taq, Gibco-BRL) in a final volume of 100 ul Each sample 
underwent a double-step of amplification: 5 cycles performed using as the hybridizing temperature 50°C, 
25 followed by 25 cycles at 68°C. 

The standard cycles were as follows: 



Denaturation: 94°C, 2 min 



5 cycles: 



Denaturation: 94°C, 30 seconds 
Hybridization: 50°C, 50 seconds 
Elongation: 72°C, 1 min. or 2 min. and 40 sec. 



30 



25 cycles : Denaturation: 94°C, 30 seconds 
Hybridization: 68°C, 50 seconds 
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Elongation: 72°C, 1 min. or 2 min. and 40 sec. 

Elongation time was 1 minute for ORFs shorter than 2000bp and 2:40 minutes for ORFs longer than 
2000bp. Amplifications were performed using a Gene Amp PCR system 9600 (Perkin Elmer). 

To check amplification results, 2ul of each PCR product were loaded onto 1-1.5 agarose gel and the 
5 size of amplified fragments was compared with DNA molecular weight standards (DNA marker IX 
Roche, lkb DNA ladder Biolabs). 

Single band PCR products were purified by PEG precipitation: 300 pi of TE buffer and 200 pi of 30% 
PEG 8000/30 mM MgClj were added to 100 pi PCR reaction. After vortexing, the DNA was centrifuged 
for 20 min at lOOOOg, washed with 1 vol. 70% ethanol and the pellet dissolved in 30 pi TE. PCR 
10 products smaller than 350 bp were purified using a PCR purification Kit (Qiagen) and eluted with 30 pi 
of the provided elution buffer. 

In order to evaluate the yield, 2pl of the purified DNA were subjected to agarose gel electrophoresis and 
compared to titrated molecular weight standards. 

D) Cloning of PCR products into expression vectors 
15 Cloning was performed following the Gateway™ technology's "one-tube protocol", which consists of a 
two step reaction (BP and LR) for direct insertion of PCR products into expression vectors. 

BP reaction (attB x attP sites): The reaction allowed insertion of the PCR product into a pDONR 
vector. The pDONR™ 201 vector we used contains the killer toxin gene ccdB between artPl and a«P2 
sites to minimize background colonies lacking the PCR insert, and a selectable marker gene for 
20 kanamycin resitance. The reaction resulted in a so called pEntry vector, in which the GBS gene was 
located between attlA and attl2 sites. 

60 fmol of PCR product and 100 ng of pDONR™ 201 vector were incubated with 2.5 pi of BP 
clonase™ in a final volume of 12.5 pi for 4 hours at 25°C, 

LR reaction (attLxattR sites): The reaction allowed the insertion of the GBS gene, now present in the 
25 pEntry vector, into E.coli expression vectors (pDestination vectors, containing attR sites). Two 
pDestination vectors were used (pDEST15 for N- terminal GST fusions - Figure 86; and pDEST17-l 
for N-terminal His-tagged fusions - Figure 87). Both allow transcription of the ORF fusion coding 
mRNA under T7 RNA polymerase promoter [Studier et al (1990) Meth. Enzymol 185: 60ff\. 

To 5 pi of BP reaction were added 0.25 pi of 0.75 M NaCl, 100 ng of destination vector and 1.5 pi of 
30 LR clonase™ . The reaction was incubated at 25°C for 2 hours and stopped with 1 pi of 1 mg/ml 
proteinase K solution at 37°C for 15 min. 
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1 of the completed reaction was used to transform 50 ul electrocompetent BL21-SI™ cells (0.1 cm, 
200 ohms, 25 uF). BL21-SI cells contain an integrated T7 RNA polymerase gene under the control of 
the salt-inducible prU promoter [Gowrishankar (1985) J. Bacteriol. 164:434$]. After electroporation 
cells were diluted in 1ml SOC medium (20 g/1 bacto-tryptone, 5 g/1 yeast extract, 0.58 g/1 NaCl, 0.186 g/1 

5 KC1, 20 mM glucose, 10 mM MgQ) and incubated at 37°C for 1 hour. 200 ul cells were plated onto 
LBON plates (Luria Broth medium without NaCl) containing 100 ug/ml ampicillin. Plates were then 
incubated for 16 hours at 37°C. 

Entry clones: In order to allow the future preparation of Gateway compatible pEntry plasmids 
containing genes which might turn out of interest after immunological assays, 2.5 ul of BP reaction were 
10 incubated for 15 min in the presence of 3 ul 0.15 mg/ml proteinase K solution and then kept at -20°C. 
The reaction was in this way available to transform E.coli competent cells so as to produce Entry clones 
for future introduction of the genes in other Destination vectors. 

E) Protein expression 

Single colonies derived from the transformation of LR reactions were inoculated as small-scale cultures 
15 in 3 ml LBON 100 ug/ml ampicillin for overnight growth at 25°C. 50-200 pi of the culture was inoculated 
in 3 ml LBON/Amp to an initial OD600 of 0.1. The cultures were grown at 37°C until OD600 0.4-0.6 
and recombinant protein expression was induced by adding NaCl to a final concentration of 0.3 M. After 

2 hour incubation the final OD was checked and the cultures were cooled on ice. 0.5 OD 6m of cells were 
harvested by centrifugation. The cell pellet was suspended in 50 ul of protein Loading Sample Buffer (50 

20 mM TRIS-HC1 pH 6.8, 0.5% w/v SDS, 2.5% v/v glycerin, 0.05% w/v Bromophenol Blue, 100 mM 
DTT) and incubated at 100 °C for 5 min. 10 ul of sample was analyzed by SDS-PAGE and Coomassie 
Blue staining to verify the presence of induced protein band. 

F) Purification of the recombinant proteins 

Single colonies were inoculated in 25 ml LBON 100 ug/ml ampicillin and grown at 25°C overnight. The 
25 overnight culture was inoculated in 500 ml LBON/amp and grown under shaking at 25 °C until OD 60 o 
values of 0.4-0.6. Protein expression was then induced by adding NaCl to a final concentration of 0.3 M. 
After 3 hours incubation at 25 °C the final OD 600 was checked and the cultures were cooled on ice. After 
centrifugation at 6000 rpm (JA10 rotor, Beckman) for 20 min., the cell pellet was processed for 
purification or frozen at -20 °C. 

30 Proteins were purified in 1 of 3 ways depending on the fusion partner and the protein's solubility: 

Purification of soluble His-tagged proteins from E.coli 

1. Transfer pellets from -20°C to ice bath and reconstitute each pellet with 10 ml B-PER™ solution 
(Bacterial-Protein Extraction Reagent, Pierce cat. 78266), 10 pi of a 100 mM MgCl 2 solution, 50 
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ju.1 of DNAse I (Sigma D-4263, 100 Kunits in PBS) and 100 jul of 100 mg/ml lysozyme in PBS 
(Sigma L-7651, final concentration 1 mg/ml). 

2. Transfer resuspended pellets in 50 ml centrifuge tubes and leave at room temperature for 30-40 
minutes, vortexing 3-4 times. 

3. Centrifuge 15-20 minutes at about 30-40000 x g. 

4. Prepare Poly-Prep (Bio-Rad) columns containing 1 ml of Fast Flow Ni-activated Chelating 
Sepharose (Pharmacia). Equilibrate with 50 mM phosphate buffer, 300 mM NaCl, pH 8.0. 

5. Store the pellet at -20°C, and load the supernatant on to the columns. 

6. Discard the flow through. 

7. Wash with 10 ml 20 mM imidazole buffer, 50 mM phosphate, 300 mM NaCl, pH 8.0. 

8. Elute the proteins bound to the columns with 4.5 ml (1.5 ml + 1.5 ml + 1.5 ml) 250 mM imidazole 
buffer, 50 mM phosphate, 300 mM NaCl, pH 8.0 and collect three fractions of ~1.5 ml each. Add 
to each tube 15 pi DTT 200 mM (final concentration 2 mM). 

9. Measure the protein concentration of the collected fractions with the Bradford method and analyse 
the proteins by SDS-PAGE. 

10. Store the collected fractions at +4°C while waiting for the results of the SDS-PAGE analysis. 

11. For immunisation prepare 4-5 aliquots of 20-100 ug each in 0.5 ml in 40% glycerol. The dilution 
buffer is the above elution buffer, plus 2 mM DTT. Store the aliquots at-20°C until immunisation. 

Purification of His-tagged proteins from inclusion bodies 

1. Bacteria are collected from 500 ml cultures by centrifugation. If required store bacterial pellets at 
-20°C. Transfer the pellets from -20°C to room temperature and reconstitute each pellet with 10 
ml B-PER™ solution, 10 pi of a 100 mM MgCl 2 solution (final 1 mM), 50 pi of DNAse I 
equivalent to 100 Kunits units in PBS and 100 ul of a 100 mg/ml lysozime (Sigma L-7651) solution 
in PBS (equivalent to 10 mg, final concentration 1 mg/ml). 

2. Transfer the resuspended pellets in 50 ml centrifuge tubes and let at room temperature for 30-40 
minutes, vortexing 3-4 times. 

3. Centrifuge 15 minutes at 30-4000 x g and collect the pellets. 

4. Dissolve the pellets with 50 mM TRIS-HC1, 1 mM TCEP {Tris(2-carboxyethyl)-phosphine 
hydrochloride, Pierce} , 6M guanidine hydrochloride, pH 8.5. Stir for ~ 10 min. with a magnetic 
bar. 

5. Centrifuge as described above, and collect the supernatant. 

6. Prepare Poly-Prep (Bio-Rad) columns containing 1 ml of Fast Flow Ni-activated Chelating 
Sepharose (Pharmacia). Wash the columns twice with 5 ml of F^O and equilibrate with 50 mM 
TRIS-HC1, 1 mM TCEP, 6M guanidine hydrochloride, pH 8.5. 
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7. Load the supernatants from step 5 onto the columns, and wash with 5 ml of 50 mM TRIS-HC1 
buffer, 1 mM TCEP, 6M urea, pH 8.5 

8. Wash the columns with 10 ml of 20 mM imidazole, 50 mM TRIS-HCl , 6M urea, 1 mM TCEP, 
pH 8.5. Collect and set aside the first 5 ml for possible further controls. 

9. Elute proteins bound to columns with 4.5ml buffer containing 250 mM imidazole, 50 mM TRIS- 
HCl, 6M urea, 1 mM TCEP, pH 8.5. Add the elution buffer in three 1.5 ml aliquots, and collect 
the corresponding three fractions. Add to each fraction 15 pi DTT (final concentration 2 mM). 

10. Measure eluted protein concentration with Bradford method and analyse proteins by SDS-PAGE. 

11. Dialyse overnight the selected fraction against 50 mM Na phosphate buffer, pH 8.8, containing 
10% glycerol, 0.5 M arginine, 5 mM reduced glutathione, 0.5 mM oxidized glutathione, 2 M urea. 

12. Dialyse against 50 mM Na phosphate buffer, pH 8.8, containing 10% glycerol, 0.5 M arginine, 5 
mM reduced glutathione, 0.5 mM oxidized glutathione. 

13. Clarify the dialysed protein preparation by centrifugation and discard the non-soluble material and 
measure the protein concentration with the Bradford method. 

14. For each protein destined to the immunization prepare 4-5 aliquot of 20-100 pg each in 0.5 ml 
after having adjusted the glycerol content up to 40%. Store the prepared aliquots at -20° C until 
immunization. 

Purification of GST-fusion proteins from E.coli 

1. Bacteria are collected from 500 ml cultures by centrifugation. If required store bacterial pellets at 
-20°C. Transfer the pellets from -20°C to room temperature and reconstitute each pellet with 10 
ml B-PER™ solution, 10 pi of a 100 mM MgCl 2 solution (final 1 mM), 50 pi of DNAse I 
equivalent to 100 Kunits units in PBS and 100 ul of a 100 mg/ml lysozime (Sigma L-7651) solution 
in PBS (equivalent to 10 mg, final concentration 1 mg/ml). 

2. Transfer the resuspended pellets in 50 ml centrifuge tubes and let at room temperature for 30-40 
minutes, vortexing 3-4 times. 

3. Centrifuge 15-20 minutes at about 30-40000 x g. 

4. Discard centrifugation pellets and load supernatants onto the chromatography columns, as 
follows. 

5. Prepare Poly-Prep (Bio-Rad) columns containing 0.5 ml of Glutathione-Sepharose 4B resin. Wash 
the columns twice with 1 ml of H 2 0 and equilibrate with 10 ml PBS, pH 7.4. 

6. Load supernatants on to the columns and discard the flow through. 

7. Wash the columns with 10 ml PBS, pH 7.4. 

8. Elute proteins bound to columns with 4.5 ml of 50 mM TRIS buffer, 10 mM reduced glutathione, 
pH 8.0, adding 1.5 ml + 1.5 ml + 1.5 ml and collecting the respective 3 fractions of ~1.5 ml each. 
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9. Measure protein concentration of the fractions with the Bradford method and analyse the proteins 
by SDS-PAGE. 

10. Store the collected fractions at +4°C while waiting for the results of the SDS-PAGE analysis. 

11. For each protein destined for immunisation prepare 4-5 aliquots of 20-100 ug each in 0.5 ml of 
5 40% glycerol. The dilution buffer is 50 mM TRIS-HC1, 2 mM DTT, pH 8.0. Store the aliquots at 

-20°C until immunisation. 

Figures 167 to 170 and 238 to 239 

For the experiments shown in Figures 167 to 170, Figure 238 and lanes 2-6 of Figure 239, the GBS 
proteins were fused at the N-terminus to thioredoxin and at C-terminus to a poly-His tail. The plasmid 

10 used for cloning is pBAD-DEST49 (Invitrogen Gateway™ technology) and expression is under the 
control of an L(+)-Arabinose dependent promoter. For the production of these GBS antigens, bacteria 
are grown on RM medium (6g/l Na 2 HP0 4 , 3gfl KH 2 P0 4 , 0.5 g/1 NaCl, 1 g/1 NH4CI, pH7,4, 2% 
casaminoacids, 0.2 % glucose, 1 inM MgCy containing 100 ug/ml ampicillin. After incubation at 37°C 
until cells reach OD 600 =0.5, protein expression is induced by adding 0.2% (v/v) L(+)Arabinose for 3 

15 hours. 

Immunisations with GBS proteins 

The purified proteins were used to immunise groups of four CD-I mice intraperitoneally. 20 ug of each 
purified protein was injected in Freund's adjuvant at days 1, 21 & 35. Immune responses were 
monitored by using samples taken on day 0 & 49. Sera were analysed as pools of sera from each group 
20 of mice. 

FACScan bacteria Binding Assay procedure. 

GBS serotype V 2603 V/R strain was plated on TSA blood agar plates and incubated overnight at 37°C. 
Bacterial colonies were collected from the plates using a sterile dracon swab and inoculated into 100ml 
Todd Hewitt Broth. Bacterial growth was monitored every 30 minutes by following OD 600 . Bacteria were 
25 grown until OD 600 = 0.7-0.8. The culture was centrifuged for 20 minutes at 5000rpm. The supernatant 
was discarded and bacteria were washed once with PBS, resuspended in l A culture volume of PBS 
containing 0.05% paraformaldehyde, and incubated for 1 hour at 37°C and then overnight at 4°C. 

50|il bacterial cells (OD 600 0.1) were washed once with PBS and resuspended in 20JJ.1 blocking serum 
(Newborn Calf Serum, Sigma) and incubated for 20 minutes at room temperature. The cells were then 
30 incubated with 100(11 diluted sera (1:200) in dilution buffer (20% Newborn Calf Serum 0.1% BSA in 
PBS) for 1 hour at 4°C. Cells were centrifuged at 5000rpm, the supernatant aspirated and cells washed 
by adding 200(J.l washing buffer (0.1% BSA in PBS). 50li1 R-Phicoerytrin conjugated F(ab)2 goat anti- 
mouse, diluted 1:100 in dilution buffer, was added to each sample and incubated for 1 hour at 4°C. Cells 
were spun down by centrifugation at 5000rpm and washed by adding 200(0,1 of washing buffer. The 
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supernatant was aspirated and cells resuspended in 200U.1 PBS. Samples were transferred to FACScan 
rubes and read. The condition for FACScan setting were: FL2 on; FSC-H threshold:54; FSC PMT 
Voltage: E 02; SSC PMT: 516; Amp. Gains 2.63; FL-2 PMT: 728. Compensation values: 0. 

Samples were considered as positive if they had a A mean values > 50 channel values. 

5 W/wle Extracts preparation 

GBS serotype III COH1 strain and serotype V 2603 V/R strain cells were grown overnight in Todd 
Hewitt Broth. 1ml of the culture was inoculated into 100ml Todd Hewitt Broth. Bacterial growth was 
monitored every 30 minutes by following OD 600 - The bacteria were grown until the OD reached 0.7-0.8. 
The culture was centrifuged for 20 minutes at 5000 rpm. The supernatant was discarded and bacteria 
10 were washed once with PBS, resuspended in 2ml 50mM Tris-HCl, pH 6.8 adding 400 units of 
Mutanolysin (Sigma-Aldrich) and incubated 3 hrs at 37°C. After 3 cycles of freeze/thaw, cellular debris 
were removed by centrifugation at 14000g for 15 minutes and the protein concentration of the 
supernatant was measured by the Bio-Rad Protein assay, using BSA as a standard. 

Western blotting 

15 Purified proteins (50ng) and total cell extracts (25ug) derived from GBS serotype III COH1 strain and 
serotype V 2603 V/R strain were loaded on 12% or 15% SDS-PAGE and transferred to a nitrocellulose 
membrane. The transfer was performed for 1 hours at 100V at 4°C, in transferring buffer (25mM Tris 
base, 192mM glycine, 20% methanol). The membrane was saturated by overnight incubation at 4°C in 
saturation buffer (5 % skimmed milk, 0.1% Tween 20 in PBS). The membrane was incubated for 1 hour 

20 at room temperature with 1:1000 mouse sera diluted in saturation buffer. The membrane was washed 
twice with washing buffer (3 % slammed milk, 0.1% Tween 20 in PBS) and incubated for 1 hour with a 
1:5000 dilution of horseradish peroxidase labelled anti-mouse Ig (Bio-Rad). The membrane was washed 
twice with 0.1% Tween 20 in PBS and developed with the Opti-4CN Substrate Kit (Bio-Rad). The 
reaction was stopped by adding water. 

25 Unless otherwise indicated, lanes 1, 2 and 3 of blots in the drawings are: (1) the purified protein; (2) 
GBS -III extracts; and (3) GBS-V extracts. Molecular weight markers are also shown. 

In vivo passive protection assay in neonatal sepsis mouse model. 

The immune sera collected from the CD1 immunized mice were tested in a mouse neonatal sepsis model 
to verify their protective efficacy in mice challenged with GBS serotype HI. Newborn Balb/C httermates 

30 were randomly divided in two groups within 24 hrs from birth and injected subcutaneously with 25ul of 
diluted sera (1:15) from immunized CD1 adult mice. One group received preimmune sera, the other 
received immune sera. Four hours later all pups were challenged with a 75% lethal dose of the GBS 
serotype m COH1 strain. The challenge dose obtained diluting a mid log phase culture was administered 
subcutaneously in 25 ul of saline. The number of pups surviving GBS infection was assessed every 12 

35 hours for 4 days. Results are in Table III. 
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Example 1 

A DNA sequence (GBSxl402) was identified in S.agalactiae <SEQ ID 1> which encodes the amino acid 
sequence <SEQ ID 2>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.48 Transmembrane 169 - 135 ( 169 - 185) 

Final Results 

bacterial membrane Certainty=0. 1192 (Affirmative) < suco 

bacterial outside Certainty=D. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB88235 GB:AL353012 hypothetical serine-rich repeat protein 
[Schizosaccharomyces pombe] 
Identities = 41/152 (26%) , Positives = 75/152 (48%) , Gaps = 4/152 (2%) 

Query: 22 SSIGYADTSDKNTDTSVVTTTLSEEKKSDELDQSSTGSSSENESSSSSEPETNPSTNPPT 81 
SS +++S +++D+S 

Sbjct: 
Sbjct 
Sbj< 

There is also homology to SEQ ID 1984. 

A related GBS gene <SEQ ID 8785> and protein <SEQ ID 8786> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
McG: Discrim Score: 6.72 
GvH: Signal Score (-7.5): -4.34 

Possible site: 27 
»> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -0.48 threshold: 0.0 

INTEGRAL Likelihood = -0.48 Transmembrane 169 - 185 ( 169 - 185) 
PERIPHERAL Likelihood =0.16 7 
modified ALOM score: 0.60 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 1192 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

LPXTG motif: 159-163 

SEQ ID 2 (GBS4) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 9 (lane 3; MW 43.1kDa) and Figure 63 (lane 4; MW 50kDa). It was also 
expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 12 
(lane 7; MW 30kDa), Figure 63 (lane 3; MW 30kDa) and in Figure 178 (lane 3; MW 30kDa). 

GBS4-GST was purified as shown in Figure 190 (lane 6) and Figure 209 (lane 8). 



WO 02/34771 



-41- 



PCT/GB01/04789 



Purified GBS4-His is shown in Figures 89A, 191 (lane 10), 209 (lane 7) and 228 (lanes 9 & 10). 

The purified GBS4-His fusion product was used to immunise mice (lane 2 product; 20ig/mouse). The 
resulting antiserum was used for Western blot (Figure 89B), FACS, and in the in vivo passive protection 
assay (Table III). These tests confirm that the protein is immunoaccessible on GBS bacteria and that it is an 
5 effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2 

A DNA sequence (GBSxllOO) was identified in S.agalactiae <SEQ ID 3> which encodes the amino acid 
10 sequence <SEQ ID 4>. This protein is predicted to be aggregation promoting protein. Analysis of this 
protein sequence reveals the following: 
Possible site: 33 

»> Seems to have a cleavable N-term signal seq. 

15 Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA69725 GB:Y08498 aggregation promoting protein [Lactobacillus gasseri] 
Identities = 56/103 (54%) , Positives = 69/103 (66%) , Gaps = 5/103 (4%) 

Query: 82 TASQAEAKSQPT I ENSMNS S SNLSSSDSAAKEE IARRESNGS YTAQNGQYYGRYQ 136 

25 TSAA+QT + + + + NS S++AAK +A RES G Y+A NGQY G+YQ 

Sbjct: 195 TYSYASAQKQTTQVAQKTQTTTSYTLNASGSEAAAKftWMAGRESGGPYSAGNGQYIGKYQ 254 

Query: 137 LSQSYLNGDLSPENQEKVADNYWSRYGSWSAALSFVJNSNGWY 179 
LS SYL GD S NQE+VADNYV SRYGSW+ A FW +NGWY 
30 Sbjct: 255 LSASYLGGDYSAANQERVADNYVKSRYGSWTGAQKFWQTNGWY 297 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8709> and protein <SEQ ID 8710> were also identified. Analysis of this 
protein sequence reveals the following: 

35 Lipop: Possible site: -1 Crend: 9 

McG: Discrim Score: 2.59 
GvH: Signal Score (-7.5): -0.42 

Possible site: 33 
»> Seems to have a cleavable N-term signal seq. 
40 ALOM program count: 0 value: 6.79 threshold: 0.0 

PERIPHERAL Likelihood =6.79 59 
modified ALOM score: -1.86 

*** Reasoning Step: 3 

45 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

50 

The protein has homology with the following sequences in the databases: 

57.5/71.3% over 92aa 
Lactobacillus gasseri 
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EGAD | 154417 | aggregation promoting protein Insert characterized 

GP|l619598|emb|CAA69725.l| |Y08498 aggregation promoting protein Insert characterized 

ORF01056(547 - 837 of 1137) 
5 EGAD| 154417 | 164788 (205 - 297 of 297) aggregation promoting protein {Lactobacillus 

gasseri}GP| 1619598 |emb| CAA69725.l| |Y08498 aggregat 
ion promoting protein {Lactobacillus gasseri} 
%Match =14.6 

%Identity = 57.4 %Similarity =71.3 
10 Matches = 54 Mismatches =26 Conservative Sub.s = 13 



SLNSISNADVISIGDVLKLDNSTASQAEAKSQPTIE5NSMNSSSNLSSSDSAAKEEIARRESNGSYTAQNGQYYGRYQLSQ 
:: :| I =1 hi I ::|:| |::||| =1 III I hi llll hllll 

WQRTYSAPVQQRTYSYASAQKQTTQVAQKTQTTTSYTLNASG SEAAAKAWMAGRESGGPYSAGNGQYIGKYQLSA 



747 777 807 837 867 897 927 957 

symgdlspenqekvadnywsrygswsaalsftosngwy^klikqrdllo^ 

20 III II I I I hi I I I II II 111 h I II :|||| 

SYLGGDYSAANQERVADNYVKSRYGSWTGAQKFWQTNGWY 
270 280 290 



A related GBS gene <SEQ ID 871 1> and protein <SEQ ID 8712> were also identified. Analysis of this 

25 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Di scrim Score: 2.59 
GvH: Signal Score (-7.5): -0.42 
Possible site: 33 
30 >» Seems to have a cleavable N-term signal seq. 

ALOM program count: 0 value: 6.79 threshold: 0.0 
PERIPHERAL Likelihood = 6.79 59 
modified ALOM score: -1.86 

35 *** Reasoning Step: 3 



Final Results 

bacterial outside -- 
bacterial membrane -- 
40 bacterial cytoplasm -- 



■- Certainty=0. 3000 (Affirmative) < suco 
■- Certainty=0 . 0000 (Not Clear) < suco 
■- Certainty=0.0000(Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

44.0/62.0% over 115aa 

Bacillus subtilis 

45 EGAD|108478| hypothetical protein Insert characterized OMNI |NT01BS1100 p60-related 

protein Insert characterized 

GP|2226145|emblCAA74437.l| |Y14079 hypothetical protein Insert characterized 
GP|2633272|emb|CAB12776.l| |Z99109 similar to cell wall-binding protein Insert 

characterized 

50 PIR|B69825|B69825 cell wall-binding protein homolog yhdD - Insert characterized 

ORF0174S(340 - 633 of 954) 

EGAD| 108478 |BS0936 (57 - 172 of 488) hypothetical protein {Bacillus subtilis}OMNI |NT01BS1100 
p60-related proteinGP | 2226145 | emb | CAA74437.l| |Y14079 hypothetical protein {Bacillus 
55 subtilis}GP|2633272|emb|CAB12776.l| |Z99109 similar to cell wall-binding protein {Bacillus 

subtilis}PIR|B69825|B69825 cell wall -binding protein homolog yhdD - Bacillus subtilis 
%Match =9.0 

%Identity =44.0 %Similarity =62.0 

Matches = 44 Mismatches = 35 Conservative Sub.s = 18 

60 

120 150 180 210 240 270 300 330 

*DQFMVLAFSFI * CEKLNNFT*RKLKIVFWRPFLY*FTIYL* * ISSKAKQLVTFTRYDSTRIN* * KRAYIMS ITSVKKSK 
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MKKKIiAAGLTASAIVGTTLVVTPAEAATIKVKSGDSLWKLAQTYNTSVAALTS 
10 20 30 40 50 
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360 390 435 465 495 525 

PFKLGVAGLLVGASLALPLSVSAAS YTVKSGDTLS AI AKNHKTTVQELVS LNS I SNAD VI S I GDV 

I | :| :| :| | |= | MINIM || I Mill II M MM I 

5 AKHLSTTVLSIGQTLTIPGSKSSTSSSTSSSTTMKSGSSWTWSGDSLWLIANEFKMTVQELKKLNGLS-SDLIRAGQK 
70 80 90 100 110 120 130 

543 573 603 633 663 693 723 753 

LKLD NSTASQAFAKSQPTIENSMNSSSNLSSSDSAAKEE I ASS * 1 KXWILHRMDNIMEDINCLNLT*MATYLLKI 

10 ||: M-l := I « M MM 111 I- = = 1= = : 

LKVSGTVSSSSSSSKKSNSNKSSSSSSKSSSNKSSSSSSSTGTYKVQLGDSLWKIANKVMSI^ 

150 160 170 180 190 200 210 

SEQ ID 8712 (GBS166) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
15 extract is shown in Figure 30 (lane 2; MW B.lkDa). 

The GBS166-His fusion product was purified (Figure 200, lane 10) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 315), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

SEQ ID 4 (GBS 15) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
20 extract is shown in Figure 9 (lane 5; MW 44.8kDa), Figure 63 (lane 5; MW 44.8kDa) and Figure 66 (lane 7; 
MW 45kDa). It was also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 10 (lane 4; MW 22.3kDa). It was also expressed as GBS15L, with SDS-PAGE 
analysis of total cell extract is shown in Figure 185 (lane 1; MW 50kDa). 

Purified GBS15-GST is shown in Figure 91A, Figure 190 (lane 9), Figure 210 (lane 4) and Figure 245 
25 (lanes 4 & 5). 

The purified GBS15-GST fusion product was used to immunise mice (lane 1 + 2 products; 2'Oug/mouse). 
The resulting antiserum was used for Western blot (Figure 91B), FACS (Figure 91C ), and in the in vivo 
passive protection assay (Table III). These tests confirm that the protein is immunoaccessible on GBS 
bacteria and that it is an effective protective immunogen. 

30 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 3 

A DNA sequence (GBSx0091) was identified in S.agalactiae <SEQ ID 303> which encodes the amino acid 
sequence <SEQ ID 304>. Analysis of this protein sequence reveals the following: 

35 Possible site: 32 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9. 56 Transmembrane 22 - 38 ( 15 - 41) 

40 Final Results 

bacterial membrane Certainty=0 .4864 (Affirmative) < suco 

bacterial outside — - Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA72096 GB:Y11213 hypothetical protein [Streptococcus thermophilus] 
Identities = 149/274 (54%) , Positives = 208/274 (75%) , Gaps = 9/274 (3%) 

Query: 23 FLVSLLLSFGIFSLIIPKSNP--KLTKKDFLTKKvTPLNYVALGDSLTEGVGDTTSQGGF 80 
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Sbjct: 


5 


Query: 


81 


Sbjct: 


64 


Query: 


141 


Sbjct: 


124 


Query: 


201 


Sbjct: 


184 




255 


Sbjct: 


244 



VP+LS++L + +++QVT NYG++GNTS QILKRM I++DL+KA L+TLTVGGNDV 



KMQTVIDNWWKATKEWDASENVyFVPINDRLYKGINGKEGITES SNSQASITN 254 

+MQT++DNWN++T+EV +NVYFVP+ND LYKGINGK G+T S + S N 



DALF DHFHPNN GYQIMS+A++++IN+T+K W 
DALFEEDHFHPNNTGYQIMSDAILKRINQTKKEW 277 

20 A related DNA sequence was identified in S.pyogenes <SEQ ID 305> which encodes the amino acid 
sequence <SEQ ID 306>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>>> Seems to have an uncleavable N-term signal seq 
25 INTEGRAL Likelihood =-12.05 Transmembrane 18 - 34 ( 10 - 37) 

Pinal Results 

bacterial membrane Certainty=0. 5819 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not clear) < suco 

30 bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9123> which encodes the amino acid sequence 
<SEQ ID 9124>. Analysis of this protein sequence reveals the following: 

Possible site: 33 
35 »> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-12.05 Transmembrane 12 - 28 



Final Results 

bacterial membrane Certainty=0 . 5819 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 178/282 (63%), Positives = 218/282 (77%) 

LLLWFVMNKKKILTGLSFFLVSLLLSFGIFSLIIPKSNPKLTKKDFLTKKVIPLNYVALG 64 
L LWFVMN + + +G+ FF++SL L+F + ++IIPKSN +L K DFL K+ + + YVA+G 
LRLWFVMNNRHLFSGI FFF VI SLCLAFLLLNI 1 1 PKSNSRLKKSDFLKKEQVAI QYVAIG S 0 

DSLTEGVGDTTSQGGFVPLLSESLHNRYSYQVTSVNYGVSGNTSQQILKRMTTDPQIEKD 124 
DSLTEGVGD T QGGFVPLL+ L + V NYGVSG+TSQQIL RM QI + 



L+KAD++TLTVGGNDV+AVIRK L+ L ++SF KPA Y++RL++I+ AR+DN LPI+ 



+LGIYNPFYLNFP+LT MQ VID+WN TKEW + VYFVPIND LYKGING+EGI 





5 


Sbjct: 


1 




65 


Sbjct: 


61 




125 


Sbjct: 


121 




185 


Sbjct: 


181 




245 



Q +1 NDALFTGDHFHPNN GYQIMSNAVMEKI - 
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Sbjct: 241 SSC3DQTTIVNDALFTGDHFHPNNTGYQIMSWAVM3KIKKH3K 282 

A related GBS gene <SEQ ID 5> and protein <SEQ ID 6> were also identified. Analysis of this protein 
sequence reveals the following: 

5 Lipop: Possible site: -1 Crend: 4 

SRCFLG: 0 

McG: Length of UR: 24 

Peak Value of UR: 3.02 
Net Charge of CR: 3 
10 McG: Discrim Score: 12.27 

GvH: Signal Score (-7.5): -3.44 

Possible site: 22 
>>> Seems to have an uncleavable N-term signal seq 
Amino Acid Composition: calculated from 1 
15 ALOM program count: 1 value: -9.66 threshold: 0.0 

INTEGRAL Likelihood = -9.66 Transmembrane 12 
PERIPHERAL Likelihood = 1.96 118 
modified ALOM score: 2.43 
icml HYPID: 7 CFP: 0.486 

20 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 4864 (Affirmative) < suco 

25 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

56.0/80.3% over 272aa 
30 GP| 1850894 | hypothetical protein Insert characterized 

ORF02006(367 - 1164 of 1467) 

GP|l850894|emb|CAA72096.l| |Y11213(5 - 277 of 280) hypothetical protein {Streptococcus 
thermophilus } 
35 %Match =30.8 

%Identity =56.0 %Similarity =80.2 

Matches =15 0 Mismatches = 49 Conservative Sub.s = 65 

141 171 201 231 261 291 321 351 

40 AV*RPSANG*IILLKVPKHEKLLKLASPTWKI 1 IWLITLEKN*LF^ 

381 411 435 465 495 525 555 ' 585 

TGLSFFLVSLLLSFGIFSLIIPKSN--PKLTKKDFLTKKVIPLNYVALGDSLTEGVGDTTSQGGFVPLLSESLHNRYSYQ 
:: |:= -W 11= =111 h l = = I II = I I I = I I I I I = I I I I = = = I I I I I I = I 1 = = I = =: = l 

45 SFAGFFLLFLLFVGILIFIIPSSHQSSKISDKIRSVKK-EKVTYVAIGDSLTQGVGDSSNQGGFVPVLSQALESDFNWQ 
10 20 30 40 50 60 70 

615 645 675 705 735 765 795 825 

VTSVNYGVSGNTSQQILKRMTTDPQIEKDLEKADLLTLWGGNDVIAVIRKELSHLSIjNSFEKPAFAYKERLKEILA^ 
50 1| ||!::|||| IIHII |::||:|| MINIM h 11= I - h! I I l-ll-h II 

VTPRNYGIAGNTSNQILKRMQEKKDIKRDLKKAKLMTLTVGGNDVIHVIK^ 

90 100 110 120 130 140 150 

855 885 915 945 975 1005 1044 

55 QDNPKLPIYVI^IYNPFYLNFPQLTKMQWIDNWKATKEVVDASENV\'FVPlNDRLYKGINGKEGIT ESSNS 

■■■■\ Illh:|lllllllll|: = hll|::|lll = :MI Hllllhll llllllll 1 = 1 = =1 

KENKTLPIYIIGIYNPFYLNFPEMTEMQTIVDISIWNRSTEEVSKEYDI^/YFVPVNDLLYKGINGKGGVTSSDETSQETKSS 
170 180 190 200 210 220 230 

60 1074 1104 1134 1164 1194 1224 1254 1284 

QAS ITNDALFTGDHFHPNNIGYQIMSNAVMEKINETRKNVIP* FKFLEXGI SLI VGN* PFLHSSDCKSLNSST*A* YRKNF 
I \: II lllllll llllll = h = : = ll = |:| I 

QDSL-NDALFEEDHFHPNNTGYQIMSDAILKRINQTKKEVISC-E 
250 260 270 280 

65 



- 28 ( 5-31) 
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SEQ ID 6 (GBS103) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 36 (lane 4; MW 32kDa). 

The GBS103-His fusion product was purified (Figure 107A; see also Figure 201, lane 9) and used to 
immunise mice (lane 2+3 product; 18.5u.g/mouse). The resulting antiserum was used for Western blot 
(Figure 107B), FACS (Figure 107C ) and in the in vivo passive protection assay (Table III). These tests 
confirm that the protein is immunoaccessible on GBS bacteria and that it is an effective protective 
immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 4 

A DNA sequence (GBSxl316) was identified in S.agalactiae <SEQ ID 3837> which encodes the amino 
acid sequence <SEQ ID 3838>. Analysis of this protein sequence reveals the following: 
Possible site: 23 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.30 Transmembrane 1058 -1074 (105S -1075) 

Final Results 

bacterial membrane Certainty=0 .2720 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 7> and protein <SEQ ID 8> were also identified. Analysis of this protein 

sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: -13.26 
GvH: Signal Score (-7.5): -5.76 

Possible site: 41 
>» Seems to have no N-terminal signal sequence 
AL0M program count: 1 value: -4.30 threshold: 0.0 

INTEGRAL Likelihood = -4.30 Transmembrane 489 - 505 ( 487 - 505) 
PERIPHERAL Likelihood =3.71 97 
modified ALOM score: 1.36 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 2720 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

LPXTG motif: 478-482 

SEQ ID 8 (GBS195) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 24 (lane 8). It was also expressed in E.coli as a GST-fusion product. SDS-PAGE 
analysis of total cell extract is shown in Figure 31 (lane 5). 

GBS195C was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 175 (lane 6 & 7; MW 81kDa). 
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GBS195L was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 83 (lane 2; MW 123kDa). 

GBS195LN was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 83 (lane 3; MW 66kDa). 

5 GBS195-GST was purified as shown in Figure 198, lane 5. GBS195-His was purified as shown in Figure 
222, lane 4-5. GBS195N-His was purified as shown in Figure 222, lane 6-7. 

The GBS195-GST fusion product was purified (Figure 87A) and used to immunise mice (lane 1 product; 
13.6ug/mouse). The resulting antiserum was used for Western blot (Figure 87B), FACS, and in the in vivo 
passive protection assay (Table III). These tests confirm that the protein is immunoaccessible on GBS 
10 bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 5 

A DNA sequence (GBSx0002) was identified in S.agalactiae <SEQ ID 4043> which encodes the amino 
15 acid sequence <SEQ ID 4044>. This protein is predicted to be lipoprotein MtsA. Analysis of this protein 
sequence reveals the following: 

Possible site: 19 

»> Seems to have no N-tcrminal signal sequence 

20 

Final Results 

bacterial cytoplasm Certainty=0. 3361 (Affirmative) < suco 

bacterial membrane Certainty=D. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

25 

A related GBS nucleic acid sequence <SEQ ID 9403> which encodes amino acid sequence <SEQ ID 9404> 
was also identified. 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3177> which encodes the amino acid 
sequence <SEQ ID 3178>. Analysis of this protein sequence reveals the following: 

30 Possible site: 13 

»> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0. 2412 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

40 Identities = 146/168 (86%) , Positives = 161/168 (94%) 

Query: 1 MNLENGIIYSKNIAKQLIAKDPKNKATYEKNRDAWAKLEKLDKEAKSKFNAIPANKKLI 60 

+NLENGIIYSKUIAKQLIAKDPKNK TYEKN AYVAKLEKLDKEAKSKF+AI NKKLI 
Sbjct: 107 LNLENGIIYSKNIAKQLIAKDPKNKETYEKlttKAY\ r AKLEKLDKEAKSKFDAIAENKICLI 166 

45 

Query: 61 VTSEGCFKYFSKAYGVPSAYI!'JEINTEEEGTPDQITSL\TCKLKQVRPSALFVESSVDKRP 120 

VTSEGCFKYFSKAYGVPSAYIWEINTEEEGTPDQI+SL++KLK ++PSALFVESSVD+RP 
Sbjct: 167 VTSEGCFKYFSKAYGVPSAYIWEINTEEEGTPDQISSLIEKLKVIKPSALFVESSVDRRP 226 
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Query: 121 MKSVSRESGIPIYAEIFTDSIAKKGQKGDSYYAMMKWNLDKIAEGLAK 168 

M++VS++SGIPIY+EIFTDSIAKKG+ GDSYYAMMKWNDDKI+EGLAK 
Sbjct: 227 METVSKDSGI PIYSEI FTDS IAKKGKPGDSYYAMMKWNLDXI SEGLAK 274 

SEQ ID 9404 (GBS679) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 164 (lane 7-9; MW 36kDa) and in Figure 188 (lane 8; MW 36kDa). Purified 
protein is shown in Figure 242, lanes 9 & 10. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 6 

A DNA sequence (GBSx0003) was identified in S.agalactiae <SEQ ID 8485> which encodes the amino 
acid sequence <SEQ ID 8486>. This protein is predicted to be ATP-binding protein MtsB. Analysis of this 
protein sequence reveals the following: 

Possible site: 55 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm -— Certainty=0 .2097 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 8765> which encodes the amino acid 
sequence <SEQ ID 8766>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1929 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

= 143/238 (60%) , Positives = 186/238 (78%) , Gaps = 2/238 (0%) 

MIISKHLSVSYDNNL-VLEDINLRLEGSGIIGILGPNGAGKSTLMKALLGLVDSTGESGI 59 
MI + +L V+YD N LE IN+ +EG I+GI+GPNGAGKST MKA+L L+D G + 
MITTNNLCVTYDGNSNALEAINVTIEGPS IVGI IGPNGAGKSTFMKAILNLIDYQGHVTV 59 



QVGL F +RPI +LSGGQFQRML+ARCL+QE+DYIFLDEPFVGIDS+SE+IIV+LLK+L 



AGK IL+VHHDLSKV+HYFD+++ILN+HL+A G + + FT + LS AYG+ ++LG+ 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Identities 
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Sbjct: 


10 




60 


Sbjct: 


70 




119 


Sb j ct : 


13 0 


Query: 


179 


Sbjct: 


190 



WO 02/34771 



-49- 



PCT/GB01/04789 



Example 7 

A DNA sequence (GBSxO0O4) was identified in S.agalactiae <SEQ ID 9> which encodes the amino acid 
sequence <SEQ ID 10>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

>>> Seems to have an uncleavable N-term signal seq 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



• Certainty=0. 0000 (Not Clear) . 

- Certainty=0. 0000 (Not Clear) • 

- Certainty=0. 0000 (Not Clear) . 



The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

15 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 8 

A DNA sequence (GBSx0005) was identified in S.agalactiae <SEQ ID 11> which encodes the amino acid 

sequence <SEQ ID 12>. This protein is predicted to be integral membrane protein MtsC (znuB). Analysis 

20 of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
McG: Discrim Score: 3.77 
GvH: Signal Score (-7.5): -0.47 
Possible site: 45 
25 >» Seems to have a cleavable N-term signal seq. 



INTEGRAL 



Likelihood =-10 
Likelihood = -7 
Likelihood = -6 
Likelihood = -5 
Likelihood = -4 
Likelihood = -4 
Likelihood = -3 
PERIPHERAL Likelihood = 5 
modified ALOM score: 2.67 

* Reasoning Step: 3 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



83 Transmembrane 138 - 154 ( 134 - 

96 Transmembrane 60 - 76 ( 50 - 

■ 111 ( 93 - 

■ 196 ( 174 - 
Transmembrane 198 - 214 ( 197 - 
Transmembrane 250 - 266 ( 246 - 
Transmembrane 222 - 238 ( 221 - 

116 



--- Certainty=0. 5331 (Affirmative) ■ 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < : 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) c ; 



A related DNA sequence was identified in S.pyogenes <SEQ ID 13> which encodes the amino acid 
sequence <SEQ ID 14>. Analysis of this protein sequence reveals the following: 

Possible site: 45 





have a cleavabl 


e N-te 


-m signal seq. 










INTEGRAL 


Likelihood — 


11 


25 


Transmembrane 




154 


134 


163 


INTEGRAL 


Likelihood = 


-9 


08 


Transmembrane 


66 


82 


50 


86 


INTEGRAL 


Likelihood = 


-6 


79 


Transmembrane 


95 


111 




118 


INTEGRAL 


Likelihood = 


-5 


63 


Transmembrane 


180 


196 


176 


216 


INTEGRAL 


Likelihood = 




73 


Transmembrane 


221 


237 


218 




INTEGRAL 


Likelihood = 




35 


Transmembrane 


250 


266 


246 


268 


INTEGRAL 


Likelihood = 




35 


Transmembrane 


198 


214 


197 




INTEGRAL 


Likelihood = 


-2 


81 


Transmembrane 






47 


64 
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bacterial membrane Certainty=0 . 5501 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear), < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 224/275 (81%) , Positives = 255/275 (92%) 

Query: 1 MFTKFFEGLLTYHFLQNAF I TAI VIGI VAGAVGCF1 ILRSMSLMGDAI SHA VLPG VAI S F 60 

M KFFEGL++YHFLQNA ITA+VIGIV+GAVGCFIILRSMSLMGDAISHAVLPGVA+SF 
Sbjct: 1 MSMKFFEGLMSYHFLQNALITAWIGIVSGAVGCFIILRSMSLMGDAISHAVLPGVALSF 60 

Query: 61 ILGIWFFIGAIVFGLLSSIIITYIKENSVTKGDTArGITFSSFIALGIILrGIiANSTTDL 120 

ILG+NFFIGAI+FGLL+S+IITYIKENSVIKGDTAIGITFSSFLALG+ILIG+ANS+TDL 
Sbjct: 61 ILGWFFIGAIIFGLLASVIITYIKENSVIKGDTAIGITFSSFIiALGVILIGVANSSTDL 120 

Query: 121 FHILFGNILAVQDSDKYMTIIVGLIVLTLITIFFKELLLTSFDPVLAKSMGMRVSFYHYL 180 

FHILFGNILAVQDSDK++TI V + VL +I++FFKELLLTSFDP+LAKSMG++V+ YHYL 
Sbjct: 121 FHILFGNII^VQDSDKWITIGVSIFVLWISLFFKELLLTSFDPILAKSMGVKVNAYHYL 180 

Query: 181 LMILLTLVAVTAMQSVGTILIVALLITPARTAYLYVKSLRTMLFLSSALGAVASVLGLYI 240 

LM+LLTLVAVTAMQSVGTILIVALLITPAATAYLY SL+ ML +SS LGA+ASVLGLY+ 
Sbjct: 181 LMVIiLTLmvTAMQSVGTILIVALLlTPAATAYLYANSLKVMLvMSSLLGALASVLGLYL 240 

Query: 241 GYTFNIAAGSSIVLTSTFMFLLAFLFSPKQSliFKK 275 

GYTFN+AAGSSIVLTS MFL++F SPKQ K+ 
Sbjct: 241 GYTFNVAAGSSIVLTSAMMFLISFFVSPKQGYLKR 275 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 9 

A DNA sequence (GBSx0006) was identified in S.agalactiae <SEQ ID 15> which encodes the amino acid 
sequence <SEQ ID 16>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1280 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 10 

A DNA sequence (GBSx0007) was identified in S.agalactiae <SEQ ID 17> which encodes the amino acid 
sequence <SEQ ID 18>. This protein is predicted to be peptidyl-prolyl cis-trans isomerase 10 (rotamase). 
Analysis of this protein sequence reveals the following: 

Lipop Possible site: 19 Crend: 2 
McG: Discrim Score: 5.27 
GvH: Signal Score (-'7.5): -4.14 

Possible site: 19 
>>> May be a lipoprotein 
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ALOM program count: 0 value: 9.34 threshold: 0.0 
PERIPHERAL Likelihood =9.34 89 
modified ALOM score: -2.37 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA19257 GB:AL023704 putative Cyclophilin-type peptidyl -prolyl 
cis-trans isomerase protein [Schizosaccharomyces pombe] 
Identities = 88/224 (39%), Positives = 123/224 (54%), Gaps = 46/224 (20%) 

Query: 50 NKKTKQALKADKKAFPQLDKAVAKNEAQ VLIKTSKGDINIKLFPKYAPL 98 

N TK L +D+ + + V NE + +1 T++GDI + IKL+P+ AP 

Sbjct: 419 NMSTKFTL-SDRDvYNEQVLPVTNNEGRQEMGNILLGKAAIIHTTQGDISIKLYPEEAPK 477 

Query: 99 AVENFLTHAKEGYYNGLSFHRVIKDFMIQSGDPNGDGTGGKSIWNSKDKKKDSGNGFVNE 158 

AV+NF THA+ GYY+ FHR+IK+FMIQ GDP GDGTGG+SIW KKD F +E 
Sbjct: 478 AVQNFTTHAENGYYDNTIFHRIIKNFMIQGGDPLGDGTGGESIW KKD FEDE 528 

Query: 159 ISPYLYNIRG-SLAMANAGADTNGSQFFINQSQQDHSKQLSDKKVPKVIIKAYSEGGNPS 217 

ISP L + R +++MAN+G +TNGSQFFI P 
Sbjct: 529 I SPNLKHDRPFTVSMANSGPNTNGSQFFITTDL TPW 564 

Query: 218 LDGGYTVFGQVISGMETVDKIASVEVTKSDQPKEKITITSIKVI 261 

LDG +T+F + +G++ V +1 E K D+P E I +1 ++ 
Sbjct: 565 LDGKHTIFARAYAGLDVVHRIEQGETDKYDRPLEPTKIINISIV 608 

A related DNA sequence was identified in S.pyogenes <SEQ ID 19> which encodes the amino acid 
sequence <SEQ ID 20>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
»> May be a lipoprotein 



Final Results 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB88542 GB:AL353818 putative protein [Arabidopsis thaliana] 
Identities = 83/186 (44%) , Positives = 104/186 (55%) , Gaps = 34/186 (18%) 

Query: 78 VVMRTSQGDITLKLFPKYAPLAVENFLTHAKKGYYDNLTFHRVINDFMIQSGDPKGDGTG 137 

V+M T+ GDI +KL+P+ P VENF TH + GYYDN FHRVI FMIQ+GDP GDGTG 
Sbjct: 476 VIMHTTLGDIHMKLYPEECPKTVENFTTHCRNGYYDNHLFHRVIRGFMIQTGDPLGDGTG 535 

Query: 138 GESIWKGKDPKKDAGNGFVNEISPFLYHIRG-AIAMANAGANTNGSQFYINQNKKNQSKG 196 

G+SIW G F +E L H R L+MANAG NTNGSQF+I 
Sbjct: 536 GQSIW GREFEDEFHKSLRHDRPFTLSMANAGPNTNGSQFFITT 578 

Query: 197 LS STNYPKPI I SAYEHGGNPSLDGGYTVFGQVIDGMDWDKI AATS INQNDKPEQDI TIT 256 

P LD +TVFG+V+ GMDW I ++ND+P QD+ I 

Sbjct: 579 VATPWLDNKHTVFGRWKGMDWQGIEKVKTDKNDRPYQDVKIL 622 

Query: 257 SIDIVK 262 

++ + K 
Sbjct: 623 NVTVPK 628 
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An alignment of the GAS and GBS proteins is shown below: 

Identities = 172/2S7 (64%) , Positives = 221/267 (82%) 

Query: 1 MKKIIYLGIACVSILTLSGCESIERSLKGDRYVDQKI^NSSKEATEQLNKKTKQALKAD 60 

MKK++ L L +S+L LS CES++R++KGD+Y+D+K A+ S+ A++ + ++ALKAD 
Sbjct: 1 MKKLLSLSLVAISLLNLSACESVDRAIKGDKriDEKTAKEESEAASKAYEESIQKALKAD 60 

Query: 61 KKAFPQLDKAVAKNEAQVLIKTSKGDINIKLFPKYAP1AVENFLTHAKEGYYNGLSFHRV 120 

FPQL K V K EA+V+++TS+GDI +KLFPKYAPLAVENFLTHAK+GYY+ L+FHRV 
Sbjct: 61 ASQFPQLTKEVGKEEAKWMRTSQGDITLKLFPKYAPIAVENFLTHAKKGYYDNLTFHRV 120 

Query: 121 IECDFMIQSGDPNGDGTGGKSIWNSKDKKKDSGNGFVKEISPYLYNIRGSLRMANAGJiDTW 180 
I DFMIQSGDP GDGTGG+SIW KD KKD+GNGFVHEISP+LY+IRG+LAMANAGA+TN 

Sbjct: : 



GSQF+INQ++++ SK LS PK II AY GGNPSLDGGYTVFGQVI GM+ VDKIA+ 
Sbjct: 181 GSQFYINQNKKNQSKGLSSTNYPKPIISAYEHGGNPSLDGGYTVFGQVIDGMDVVDKIAA 240 

Query: 241 VEVTKSDQPKEKITITSIKVIKDYKFK 267 

4 ++D+P++ ITITSI ++KDY+FK 
Sbjct: 241 TSINQNDKPEQDITITSIDIVKDYRFK 267 

SEQ ID 18 (GBS205) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 51 (lane 13; MW 31kDa). 

GBS205-His was purified as shown in Figure 206, lane 8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 11 

A DNA sequence (GBSx0008) was identified in S.agalactiae <SEQ ID 21> which encodes the amino acid 
sequence <SEQ ID 22>. This protein is predicted to be sporulation protein SpoIIIE (ftsK). Analysis of this 
protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 10 

McG: Discrim Score: -22.83 

GvH: Signal Score (-7.5): -7.13 
Possible site: 39 

»> Seems to have no N-terminal signal sequence 

ALOM program count: 5 value: -9.24 threshold: 0.0 

INTEGRAL Likelihood = -9.24 Transmembrane 36 - 52 ( 27 - 60) 
INTEGRAL Likelihood = -9.18 Transmembrane 162 - 178 ( 154 - 188) 
INTEGRAL Likelihood = -4.04 Transmembrane 597 - 513 ( 595 - 615) 
INTEGRAL Likelihood = -3.77 Transmembrane 63 - 79 ( 58 - 83) 
INTEGRAL Likelihood = -2.60 Transmembrane 90 - 106 ( 88 - 108) 
PERIPHERAL Likelihood = 1.32 136 
modified ALOM score: 2.35 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=C .4694 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0, 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10035> which encodes amino acid sequence <SEQ ID 
10036> was also identified. 



The protein has homology with the following sequences in the GENPEPT database: 
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>GP:CAB13553 GB:Z99112 DNA translocase [Bacillus subtilis] 
Identities = 352/822 (42%) , Positives = 508/822 (60%) , Gaps = 70/822 (8%) 

Query: 14 KTRRPTKMIERQRAIQRMITALVIiTIILFFGIIRLGIFGITvYNVIRFMVGSLAYLFIA 73 

K +R ++ + 4Q 14 4 L4 I I++LG+ G T + RF G L 4 
Sbjct: 3 KKKRKSRKKQAKQI1NIKYEI1NGLLCIAISIIAILQLGWGQTFIYLFRFFAGEWFILCLL 62 



Query: 74 ATLIYLYFFKWLRKKDSLV AGFLIASLGLLIEWHAYLFS MPILKDKEIBRST 125 

L+ W +K SL+ AG +L+ H LF ++ ++R+T 

Sbjct: 63 GLLVLGVSLFWKKKTPSLLTRRKAGLYCIIASILLLSHVQLFKNLTHKGSIESASvWNT 122 

Query: 126 ARLIVSDLMQFKITVFAGGGMLGALIYKPIAFLFSNIGAYMIGVLFIILGLFLMSSLEVY 185 

L + D+ + GGGM+GAL++ FLF++ G+ 44 44 I44G4 L44 4 

Sbjct: 123 WELFLMDMNGSSASPDLGGGMIGALLFAASHFLFASTGSQIMAIVMILIGMILVTGRSLQ 182 



Query: 186 DIVE FIR AFKN-- 

4 44 FI4 AF 4 K 4 4 Q4 K4 A 4 4K 44444E 4 4 

Sbjct: 183 ETLKKWMSPIGRFIKEQWIiAFIDDMKSFKSNMQSSKKTKAPSKKQKPARKKQQMEPEPPD 242 
Query: 232 EEAYLASVNVDPETGEILEDQAEDNLDDALPPEVSETSTPVFEP-EILAYETSPQNDPLP 290 

EE 4V4 4 14 +4 H 4+ P 4 4 + PV 4P 4 4 ET Q 4 4 

Sbjct: 243 EEGDYETVSPLIHSEPIISSFSDRNEEEE-SPVIEKRAEPVSKPLQDIQPETGDQ-ETVS 300 

Query: 291 VEPTIYLEDYDSPIPNMRENDEEMVYDLDDDVDDSDIENVDFTPKTTLVYICLPTIDI,FAP 350 

P 4 E 4EN D Y44P44DL A 

Sbjct: 301 APPMTFTE LENKD YEMPSLDLLAD 324 

Query: 351 DKPI<NQSI<EKDLWI<HIRVI,EETFRSFGIDWVERAEIGPSVTKYEIKPAVGVRVNRISN 410 

K Q +K 4 4N R LE TF4SFG4 KV 4 4GP4VTKYE4 P VGV4V44I N 
Sbjct: 325 PKHTGQQADKranYENARiaLERTFQSFGVKAKVTQVHMPAVTKYEVYPDVGVKVSKIVN 384 

Query: 411 LSDDLALALAAKDVRIETPIPGKSLIGIEVPNSEIATVSFRELWEQS-DANPENLLEVPL 469 

I1SDDLALALAAKD4RIE PIPGKS IGIEVPN4E4A VS 4E4 E 4 P4 4 + L 
Sbjct: 385 LSDDLALALAAKDIRIEAPIPGKSAIGIEVPNAEVAMVSLKEVXjESKL^RPDANVLIGL 444 

Query: 470 GKAVNGNARSFNLARMPHLLVAGSTGSGKSVAVNGIISSILMKARPDQVKFMMIDPKMVE 529 

G4 44G A L 4MPHLLVAG4TGSGKSV VNGII4SILM4A4P 4VK MMIDPKMVE 
Sbjct: 445 GRNISGEAVLAEI^KMPHLLVAGATGSGKSVCVNGIITSILMRAKPHEVIO^MIDPKMVE 504 

Query: 530 LSvYNDIPHLLIPVVTNPRKASKALQKAA/DEMF^NRYELFSKIGVRNIAGY^KVEEFNAS 589 

L4VYN IPHLL PWT4P4KAS4AL4KVV4EME RYELFS G RNI GYN 4+ N 
Sbjct: 505 LIWYNGIPHLIAPVVTDPKKASQALKKVVITOMERRYELFSHTGTPJ^IEGYiroYIKRAH]^ 564 

Query: 590 SEQKQIPLPLIWIVDELADLMMVASKEVEDAIIRLGQKARAAGIHMILATQRPSVDVIS 649 

KQ LP IWIVDELADLMMVAS 4VED4I RL Q ARAAGIH4I4ATQRPSVDVI4 
Sbjct: 565 EGAKQPELPYIWITOEIADLMMVASSDVEDSITRLSQMARAAGIHLIIATQRPSVDVIT 624 

Query: 650 GLIKANVPSRIAFAVSSGTDSRTILDENGAEKLLGRGDMLFKPIDENHPVRLQGSFISDD 709 

G4IKAN4PSRIAF4VSS TDSRTILD GAEKLLGRGDMLF P4 N PVR4QG4 F4SDD 
Sbjct: 625 GVIKANIPSRIAFSVSSQTDSRTILDMGGAEKLLGRGDMLFLPVGANKPVRVQGAFLSDD 684 

Query: 710 DVERIVGFIKDQAEADYDDAFDPGEVSETDN3SGGGGGVPESDPLFEEAKGLVLETQKAS 769 

4VE44V 4 Q 4A Y 4 P E 4ET 4 4D L4+EA L44 Q AS 

Sbjct: 685 EVEKWDHVITQQKAQYQEEMIPEETTETHS EVTDELYDEAVELIVGMQTAS 736 



46.5/66.5% over 775aa 

OMNl|NT01BS1964| sporulation protein SpoIIIE Insert characterized 
ORF01349(340 - 2733 of 3048) 

OMNl|NT01BS1964 (6 - 781 of 790) sporulation protein SpoIIIE 
%Match =29.6 

%Identity = 46.4 %Similarity =66.5 

Matches = 352 Mismatches = 243 Conservative Sub.s = 152 
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Tm*IATT*S*YTDTG*TKIlMFFHTYSLIKLLR*LYFIINF*IIYKSK**TYTIGTC*K^RIV*HELIEKmiNKY-FT*K 



WFMANKKKTKGKKTRRPTKAEIERQRAIQRMIT^ 

I =| := : =1 1= = 1 = I l==ll= II : I 1 I 1=1 = 
VMSVAKKKRKSRKKQAKQLNIKYELNGLLCIAISIIAILQLGWGQTFIYLFRFFAGEWFILCLLGLLVLGV 



FFKHLRKKDSLV AGFLIASLGLLIEWHAYLFSMP:LK DKEILRSTARLIVSDLMQFKITVFAGGGMLGALIY 

= I =1 lb 11= = 1 = I II I =:|:| 1 = 1= = lllhlll:: 

SLFWKKKTPSLLTRRKAGLYCIIASILLLSHVQLFKNLTHKGSIESASVVROTTOLFLMDNmGSSASPDLGGGMIGALLF 



KPIAFLFSNIGAYMIGVLFIILGLFLMSSLEVYDIVE FIR AF--KNKVAEKHEQNKKERFAKREMKKA 

|||:: |: :: :: |::|: |:: : : :: ||: || | : , |; |, | : :| 

AASHFLFASTGSQIMAIWILIGMILOTGRSLQETLKKimSPIGRFIKZQWLAFIDDMKSFKSNMQSSKKTKAPSKKQKP 



984 1014 1044 1074 1104 1134 1164 1194 

IAEQER1ERQKAEEEAYLASVNVDPETGEILEDQAEDNLDDALPPEVSETSTPVFEPEIIAYETSPQNDPLPVEPTIYLE 
,:,,;| : :|| :|: : |: :: | :: | : : : || :| 



250 260 270 280 

1224 1254 1281 1326 1356 1385 1416 

DYDS PI PNMRENDEEMVYDLDD - DVDDSDIENVDFTPKT TLVYKLPTIDIjFAPDKPKNQSKEKDLVRKNIRVLEE 

II : I I I II |::| = :|| = | | | :| = =| | || 

LQDIQPETGDQETVSAPFMTFTELENKDYEMPSLDLLADPKHTGQQADKKNIYENARKLER 

290 300 310 320 330 340 

1446 1476 1506 1536 1566 1596 1626 1656 

TFRSFGIDVKVERAEIGPSVTKYEIKPAVGWVNRISNLSDDLALAIjiUlKI3WIETPIPGKSLIGIEVPNSEIATVSFRE 

II = 111= II : =11=11111= I 11=1==! I I I I I I I I I I I I I I : I I I llllll lllllli'|:| l|::| 
TFQSFGVKAKVTQWLGPAVTKYEVYPDVGVK^SKIWLSDDIAI^^ 

360 370 380 390 400 410 420 

1683 1713 1743 1773 1803 1833 1863 1893 

LWEQS - DANPENLLEVPLGKAWGNARS FNIiARMPHIjLVAGSTGSGKSVAVNGI I S S ILMKARPDQVKFMMI DPKMVELS 
: I : |: : = ||: : = | I I : I 1 I I I I I 1 = 1 I I III I I I I I I = I I I 1 = I = I =1 I 1111111111 = 

VLESKXjNDRPDAWLIGLGRNISGEAVIAELNKMPHLLVAGATGSGKSVC 

440 450 460 470 480 490 500 

1923 1953 1983 2013 2043 2073 2103 2133 

VYITOIPHLLIPVVTNPRKASKAIiQKVVDEMENR^ 

III Mill llll = |:||l = ll = lll = lll llllll I III III := I II II llllllllllll 
VYISIGIPHLIjAPVVTDPKKASQALKKlA/iraMERRYELFSHTG 

520 530 540 550 560 570 580 

2163 2193 2223 2253 2283 2313 2343 2373 

WASKEVEDAIIRLGQKARAAGIHMILATQRPSVDVISGLIKAKVPSRIAFAVSSGTDSRTILDENGAEKLLGRGDMLFK 
llll :|l|:| II I III II I |:|: III lllllll: I: III I: llllll: III |||||lll I I I I I I I I I I I I I 
MVASSDVEDSITRLSQMARAAGIHLIIATQRPSVDVITGVIKAHIPSRIAFSVSSQTDSRTILDMGGAEKLLGRGDMLFL 
600 610 620 630 640 650 660 

2403 2433 2463 2493 2523 2553 2583 2613 

PIDENHPVRLQGSFISDDDVERIVGFIKDQ7AEADYDDAFDPGEVSETDNGSGGGGGVPESDPLFEEAKGLVLETQKASAS 
1= 1 lll = ll = l = lll = ll = :| = I =1 I = I I =11 =1 h = ll h« I II I 

PVGANKPTOVQGAFLSDDEVEKWDHVITQQKAQYQEEMIPEETTET HSEVTDELYDEAVELIVGMQTASVS 



2643 2673 2703 2733 2763 2793 2823 

MIQRRLSVGFI^IRATRL^ffiELEAAGVIGPAEGTKPRKVIJMTPTPSE*EKT^^TRNCRISFLCYI^EAlSIR , 
1=111= =|: II ll== =1 11=11 11=111=11== = = 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 23> which encodes the amino acid 
sequence <SEQ ID 24>. Analysis of this protein sequence reveals the following: 



Possible site: 51 
» Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = -9.45 Transmembrane 
INTEGRAL Likelihood ■ 
INTEGRAL Likelihood = -4. 
INTEGRAL Likelihood ■■ 
INTEGRAL Likelihood . 



Transmembrane 
Transmembrane 
Transmembrane 



31 - 47 ( 25 - 

160 - 176 ( 153 - 

93 - 109 ( B6 - 

586 - 602 ( 5B4 - 604! 

64 - 80 ( 64 - 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 4779 (Affirmative) < succ; 
• Certainty=0. 0000 (Not Clear) < suco 
■ Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

!GB:Z99112 DNA translocase [Bacillus subtilis] 601 e-170 
Identities = 354/816 (43%) , Positives = 499/816 (60%) , Gaps = 69/816 (8%) 

11 APKKRLTKAEVEKQRAIKRMILSVL^^LLIFAMi^LGVFGVTTYNMIRFLVGSLAYPFM 70 

A KKR ++ + KQ IK + +L +1 A+L+LGV G T + RF G + 
2 AKKKRKSRKKQAKQLNIKYELNGLLCIAISIIAILQLGWGQTFIYLFRFFAGEWFILCL 61 



Sbjct: 
Sbjct: 

Query: 

Sbjct: 

Sbjct: 

Query: 

Sbj. 

Query: 

Sbjct 

Sbjct: 

Sbjct 

Sbjct 
Query: 
Sbjct 
Query: 



71 FAWLIYLFCFKWLRQKDGMI AGWIAFLGLLVEWHAFLFA MPRMLDQDIFLG 122 



183 YD 

+ + F+KE +D + +++ N + K+ + + +K A +KQ E 

182 QETLKKWMSPIGRFIKEQWLAFIDDMK-SFKSNMQSS--KKTKAPSKKQKPARKKQQMEP 238 



344 NQSKEKNLWJCNIKVLEDTFQSFGIDVKVERAEIGPSVTKYEIKPAVGTOVNRISNLADD 403 

Q +K + +N + LE TFQSFG+ KV + +GP+VTKYE+ P VGV+V++I NL+DD 
329 GQQADKKNIYENARKLERTFQSFGVKAKVTQVHLGPAVTKXEVYPDVGVKVSKIVNLSDD 388 

404 LALALAAKDVRIEAPIPGKSLIGIEVPNSEIATVSFRELWEQS-DANPENLLEVPLGKAV 462 

LALALAAKD+RIEAPIPGKS IGIEVPN+E+A VS +E+ E + P+ + + LG+ + 
389 LAUUjAAKDIRIFAPIPGKSAIGIEVPNAEVAWSLKEVLESKLNDRPDANV^ 448 

463 NGNARSFNLARMPHLLVAGSTGSGKSVAVNGIISSILMKARPDQVKFMMIDPKMVELSVY 522 

+G A L +MPHLLVAG+TGSGKSV VNGII+SILM+A+P +VK MMIDPKMVEL+VY 
449 SGFoAVLAEIiNKMPHLLVAGATGSGKSVCTNGIITSILMRAKPHEVKMMMIDPKIWELNW 508 

523 NDIPHLLIPVVTNPRKASKALQKVVDEMENRYELFSKIGVRNIAGYNTKVEEFNASSEQK 582 

N IPHLL PWT+P+KAS+AL+KW+EME RYELFS G RNI GYN ++ N K 
509 NGIPHLLAPVVTDPKKASQALKKVVNEMERRYELFSHTGTRNIEGYNDYIKRANNEEGAK 568 

583 QIPLPLIWIVDELADLI#WASKEVEDAIIRLGQK7ARAAGIHMILATQRPSVDVISGLIK 642 

Q LP IWIVDELADLMMVAS +VED+I RL Q ARAAGIH+I+ATQRPSVDVI+G+IK 
569 QPELPYIWIVDEIMLMMVASSDVEDSITRLSQMARAAGIHLIIATQRPSVDVITGVIK 628 
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Query: 643 ANVPSI^MAFAVSSGTDSRTILDEKGAEICbLGRC-DMLFKPIDENHPWLQGSFISDDDVER 702 

AN+PSR+AF+VSS TDSRTILD GAEKLLGRGDMLF P+ N PVR+QG+F+SDD+VE+ 
Sbjct: 629 ANIPSRIAFSVSSQTDSRTILDMGGAEKLLGRGDMLFLPVGANKPVRVQGAFLSDDEVEK 688 

Query: 703 IWFIKDQTEADYDDAFDPGEVSDITOPGFSGNGGAAJ2GDPLFEEAKALVLETQKASASMI 762 

+V+ + Q+AY+ PE++ + D L++EA L++ Q AS SM+ 

Sbjct: 689 WDHVITQQKAQYQEEMI PEETTETHSEVT DELYDEAVELIVGMQTASVSML 740 

Query: 763 QRRLSVGFNRATRLMDELEEAGVIGPAEGTKPRKVL 798 

QRR +G+ RA RL+D +EE GV+GP EG+KPR+VL 
Sbjct: 741 QRRFRIGYTRAARLIDAMEERGWGPYEGSKPREVL 776 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 620/818 (75%) , Positives = 701/818 (84%) , Gaps = 25/818 (3%) 

Query: 1 MVFMANKKKTKGKKTRRPTKAEIERQRAIQRMITALVLTIILFFGIIRLGIFGITVYNVI 60 

MV +KK+ KK R TKAE+E+QRAI+RMI ++++ ++L F ++RLG+FG+T YN+I 
Sbjct: 1 MVKRNQRKKSAPKK--RLTKAEVEKQRAIKRMILSVLMALLLIFAMLRLGVFGVTTYNMI 58 

Query: 61 RFMVGSIAYLFIAATLIYLYFFKWLRKKDSLVAGFLIASLGLLIEWHAYLFSMPILKDKE 120 

RF+VGSIAY F+ A LIYL+ FKWLR+KD ++AG +IA LGLL+EWHA+LF+MP + D++ 
Sbjct: 59 RFLVGSIAYPFMFAWLIYLFCFKWLRQKDGMIAGWIAFLGLLVEWHAFLFAMPRMLDQD 118 

Query: 121 ILRSTARLIVSDLMQFKITVFAGGGMLGALIYKPIAFLFSNIGAYMIGVLFIILGLFLMS 180 

I TARLI DL+ ++T F GGGMLGAL+YKPIAFLFSNIG+Y IG LFI+LGLFLM+ 
Sbjct: 119 IFLGTARLITRDLLALRVTEFVGGGMLGALLYKPIAFLFSNIGSYFIGFLFILLGLFLMT 178 

Query: 181 SLEVYDIVEFIRAFKNKVAEKHEQNRKERFAKREMKKAIAEQERIERQKAEEEAYIASVN 240 

++YD+ F++ +K+A +++NK++RF KRE + AE+E +E+Q EEE LA + 
Sbjct: 179 PWDIYDVSHFVK^VDKLAVAYQENKEKRFIKREEHRLQAEKEALEKQAQEEEKRIAELT 238 

VDPETGEI-l-ED + + + +E T EPEILAY-l-1- -H-D + E Y 

Sbjct: 239 VDPETGEIVEDSQSQ VSYDLAEDMTK--EPEILAYDSHLKDDETSLFDQEDLAYA 291 

Query: 298 ED YDSPIPNMRENDEEMVYDLDDDVDDSDIENVDFTPKTTLVYKLPTIDLFAPDKP 353 

+ YDS + + +++EM D+D+ V+ VDFTPKT L+YKLPTIDLFAPDKP 

Sbjct: 292 HEEIGAYDS-LSALASSEDEM- -DMDEPVE VDFTPKTHLLYKLPTIDLFAPDKP 342 

Query: 354 KNQSKEKDLVRKNIRVLEETFRSFGIDVKVERAEIGPSVTKYEIKPAVGVRVNRISNLSD 413 

KNQSKEK4LWKNI+VLE+TF+SFGIDVKVERAEIGPSVTKYEIKPAVGVRVNRISNL+D 
Sbjct: 343 KNQSKEKNLVRKN1 KVLEDTFQS FGIDVKVERAEIGPSVTKYE IKPAVGVRVNRI SNLAD 402 

Query: 414 DLALALAAKDVRIETPIPGKSL1GIEVPNSEIATVSFRELWEQSDANPENLLEVPLGKAV 473 

DLALALAAKDVRIE PIPGKSLIGIEVPNSEIATVSFRELWEQSDANPENLLEVPLGKAV 
Sbjct: 403 DIALALAAKDVRIEAPIPGKSLIGIEVPNSSIATVSFRELWEQSDANPENLLEVPLGKAV 462 

Query: 474 NGNARSFNIARMPHLLVAGSTGSGKSVAVNGIISSILMKARPDQVKFMMIDPKMVELSVY 533 

NGNARSFNLARMPHLLVAGSTGSGKSVAVNGIISSILMKARPDQVKFMMIDPKMVELSVY 
Sbjct: 463 NGNARSFNIARMPHLLVAGSTGSGKSVAVNGIISSILMKARPDQVKFMMIDPKMVELSVY 522 

Query: 534 NDIPHLLIPVVTNPRKASKALQKVVDEMEl^YELFSKIGVFJJIAGYOT'KVEEFNASSEQK 593 

NDIPHLLIPVVTNPRKASKALQKT^EMEiraYELFSKIGVRMIAGYOTKVEEFNASSEQK 
Sbjct: 523 NDIPHLLIPVVITIPRE<ASKALQKVVDEMErTOYEIjFSKIGVRNIAGYNTKVEEFNASSEQK 582 

Query: 594 QIPLPLIWIVDELADLI#WASKEVEDAIIRLGQKARAAGIHMILATQRPSVDVISGLIK 653 

QIPLPLIWIvDEIADLMMVASKEVEDAIIRLGQKARAAGIHMILATQRPSVDVISGLIK 
Sbjct: 583 QIPLPLIWIVDELADLMMVASKEVEDAIIRLGQKARAAGIHMILATQRPSVDVISGLIK 642 

Query: 654 ANVPSRIAFAVSSGTDSRTILDENGAEKLLGRGDMLFKPIDENHPVRLQGSFISDDDVER 713 

ANVPSR+AFAVSSGTDSRTILDENGAEKLLGRGDMLFKPIDENHPVRLQGSFISDDDVER 
Sbjct: 643 AWPSRMAFAVSSGTDSRT I LDENGAE KLLGRGDMLFKP IDENHPVRLQGSFI SDDDVER 702 

Query: 714 IVGFIKDQAEADYDDAFDPGEVSETDNGSG3GGGVPESDPLF3EAKGLVLETQKASASMI 773 

IV FIKDQ EADYDDAFDPGEVS+ D G G GG E DPLFEEAK LVLETQKASASMI 
Sbjct: 703 IVNFIKDQTEADYDDAFDPGEVSDNDPGFSGNGGAAEGDPLFEEAKALVLETQKASASMI 762 
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Query: 774 QRRLSVGFNRATRLMEELZA&GVIGPAEGTKPRKVLMT 811 

QRRLSVGFNRATRLM+ELE AGVIGPAEGTKPRKVL T 
Sbjct: 763 QRRLSVGFNRATRLMDELEEAGVIGPAEGTKPRKVLQT 800 

5 SEQ ID 22 (GBS272d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 147 (lane 9; MW 55kDa + lane 10; MW 70kDa). It was also expressed in E.coli 
as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 147 (lane 11 & 13; MW 
85kDa + lane 12; MW 74kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
1 0 vaccines or diagnostics. 

Example 12 

A DNA sequence (GBSx0009) was identified in S.agalactiae <SEQ ID 25> which encodes the amino acid 
sequence <SEQ ID 26>. This protein is predicted to be para-aminobenzoate synthetase (pabB) (pabB). 
Analysis of this protein sequence reveals the following: 
15 Possible site: 61 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 4073 (Affirmative) < succ; 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD07357 GB:AE000547 para-aminobenzoate synthetase (pabB) 
[Helicobacter pylori 26695] 
Identities = 204/580 (35%), Positives = 325/580 (55%), Gaps = 50/580 (8%) 





16 


Sbjct: 


6 




75 


Sbjct: 


65 




132 


Sbjct: 


115 




190 


Sbjct: 


175 




248 


Sbjct: 


235 


Query: 


308 


Sbjct: 


295 


Query: 


366 


Sbjct: 


354 



3 KNRSEN+MIVDLLRND+ R+ 



V +A+F+VPIRT++ + + + GVG G+T++S+ EYEE+ 



Query: 421 YLHQLDEKDYRIiKIMLDKTGKOTFSTVTCQLVI^SKKFLTAEVWQDYPI-KLSPFTYFKTS 479 
EK+ L+++L+K GK+ E K L L + E+ + + PIK + FY KT+ 



WO 02/34771 



-58- 



PCT/GB01/04789 



Sbjct: 410 EKEGVLRVLLNKKGKLIKEYKTLEPLK SLEIRLSEAPIDKRNDFLYHKTT 459 

Query: 480 YRPHIIEGQN EKIFVSPEGLLLETSIGMIVLEKNGRFLTPDLSEGGLNGIYR 531 

Y P + + ++IF + + L E + N+VLE + R LTP S G LNG 

Sbjct: 460 YAPFYQKARALIKKGVMFDEIFYNQDLELTEGARSHLVLEIHNRLLTPYFSAGALNGTGV 519 

Query: 532 RHLLKNQKVIEAPLTLKDLESADAIYACNa.VRGLYPLNLK 571 

LLK V APL L+DL+ A IY NA+ GL + +K 
Sbjct: 520 VGLLKKGLVGHAPLKLQDLQKASKIYCINALYGLVEVKIK 559 

A related DNA sequence was identified in S.pyogenes <SEQ ID 27> which encodes the amino acid 
sequence <SEQ ID 28>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2669 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 303/572 (52%) , Positives = 406/572 (70%) , Gaps = 1/572 (0%) 

MHIETVIDFKELGKRYRFKNPTKELIADTliEQVLEVIKEVDYYQSQNYYWGYLSYEASA 6 0 
MH +T+IDFKELG+RY F P EL+A +L+QV VT++V +YQ YYWGYLSYEA+A 
MHRKTIIDFKELGQRYLFDEPLVELVAKSLDQVGPVIEKWQHYQQlliGYYvVGYLSYEARA 62 



I +++QGNTYQVNYTL+L+Q+L +D ++Y +L+VEQ AGYNAYIA+D+ ++S SPE 



LFFK++ LTTRPMKGT+ R D E DWL D KKRSENMMIVDLLRNDMG+IC 



G+V+V +LC+VE+Y+TVWQMTSTI G L + L+ I 4AL+PCGSITGAPK+STMAI 



Query: 


1 


Sbjct: 


3 




61 


Sbjct: 


63 


Query: 


121 


Sbjct: 


123 




180 


Sbjct: 


183 




240 


Sb j ct : 


243 


Query: 


300 


Sbjct: 


303 




360 


Sbjct: 


363 


Query: 


420 


Sbjct: 


423 


Query: 


480 


Sbj ct : 


483 




540 


Sbjct: 


543 



1 LE +PRGIYCG4-1G+C+PDG+ FNVPIRT+Q+ QA YGVGGGITW+S+ 4 



E QK+A L R F L TT +V K+ F +QH+ RL E+A+YFAY +++ +++L 



YRL ILK GK++ + L LS FLTA++ +Q 



3 L G++R+ LL 



+TL DL+ A AI+ 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 13 

A DNA sequence (GBSxOOlO) was identified in S.agalaciiae <SEQ ID 29> which encodes the amino acid 
sequence <SEQ ID 30>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1564 (Af firmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 31> which encodes the amino acid 
sequence <SEQ ID 32>. Analysis of this protein sequence reveals the following: 
Possible site: 13 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5335 (Affirmative) c suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 220/267 (82%) , Positives = 243/267 (90%) 

LLLEITKIARATYYYQLKKIiNKPNECDKAIKSDIQSIYDEHRGNYGYRRIYLELRNRGFVI 69 
+LLEI ++R+TYYYQ+K+L + +KD +K 1+ IYDEH+GNYGYRRI++ELRNRGFV+ 





10 


Sbjct: 


1 




70 


Sbjct: 






130 


Sbjct: 


121 




190 


Sbjct: 


181 







H+SYH FLE KGI SMSRKGNSPDNGMME3FFGILKSEMFYGLE +Y+SLD LE+AITD 



YI FYYNNKRI KAKLKGLSPVQYRTKS F 275 
YIFYYNNKRIKAKLKG SPVQYRTKSF 
Sbjct: 241 YIFYYNNKRIKAKLKGFSPVQYRTKSF 267 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 14 

A DNA sequence (GBSxOOll; GBSx2234) was identified in S.agalactiae <SEQ ID 33> which encodes the 
amino acid sequence <SEQ ID 34>. Analysis of this protein sequence reveals the following: 

Possible site: 27 



WO 02/34771 



PCT/GB01/04789 



»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3578 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 35> which encodes the amino acid 
sequence <SEQ ID 36>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0. 3869 (Affirmative) < sue 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 107/170 (62%) , Positives = 134/170 (77%) 

Query: 1 MKLSYEDKLEIYELRKIGMSWSQISQRYDTOISNLKYMIKL^RYGVEIvEKGRNEYYPP 6 



Query: 61 ELKQEMI DKVL I HGCSQLS VSLDYALSNCS I LTNWLSQFKKNGYTI VEKTRGRPSKMGRK 120 

ELKQE+I+KVLI G SQ SLDYAL S+L+ W++Q+KKNGYTI+EK RGRPSKMGRK 
Sbjct: 61 ELKQEIINKVLIDGQSQKQTSLDYALPTSSMLSRWIAQYKKNGYTILEKPRGRPSKMGRK 120 



Query: 121 R 

RKK EEMTE+ERLQ+E E R ENA LKKLR+ RLRDEA E+QK + 
Sbjct: 121 RKKNLEEWEVERLQKELEYPRAENAVLKKLRErmJlDEAKLKEQQKSFK 170 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 15 

A DNA sequence (GBSx0012) was identified in S.agalactiae <SEQ ID 37> which encodes the amino acid 
sequence <SEQ ID 38>. This protein is predicted to be oxyR protein. Analysis of this protein sequence 
reveals the following: 

Possible site: 22 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1323 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10033> which encodes amino acid sequence <SEQ ID 
10034> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA91664 GB:Z67753 former trsE (rbcR homolog) [Odontella sinensis] 
Identities = 72/259 (27%) , Positives = 127/259 (48%) , Gaps - 7/259 (2%) 



WO 02/34771 



-61- 
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Sb j at : 


8 


Query: 


65 


Sb 3 ct: 




Query: 


125 


Sbjct: 


128 




183 


Sbjct: 


188 




242 


Sb j ct : 





VL F HP+I+++ 



¥ V GDID+AV G P + + DEL L+IP + 



A related DNA sequence was identified in S. pyogenes <SEQ ID 39> which encodes the amino acid 
sequence <SEQ ID 40>. Analysis of this protein sequence reveals the following: 

Possible site: 30 
>>> Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = -1.28 
INTEGRAL Likelihood = -0.27 

Final Results 

bacterial membrane Certainty=0 . 1510 (Affirmative) • 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < i 

The protein has homology with the following sequences in the databases: 



Query: 


2 


Sbjct: 


9 


Query: 


62 


Sbjct: 


69 


Query: 


122 


Sbjct: 








Sbjct: 


189 




242 


Sbjct: 


249 




302 


Sbjct: 


309 



IRQGESYLDIKQIRYFIAIVENHFNLSQAAELLYVSQPTLSMMINDFEKRENVKLFKRKR 61 
+ +G +DI+ +RYF++IV+N FNLS+A++ LYVSQP LSMMI +FE REN+++FKR 
VLRGWmDIRHmYFVSIVDNDFNLSRaSQmJWSQPALSMMITEFENRENIQIFKRAS 68 



G+IIGLT4 G4NYY+DA++V+ Y+DM L+ KG+I IGIPPL+LS VFS \ 



- K N 4LTI P P+ ELY K+ C +E P+ W+V L R RK Y+ +E YI D LL 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 61/227 (26%) , Positives = 111/227 (48%) , Gaps = 10/227 (4%) 

Query: 9 YLESIELYSNITKAAAHLFISQPYLSJOTIKQLENELEIKLIQ-SQGHQTFLTYAGQRYLF 67 

++ +E + N+++AA L++SQP LS +1 E +KL + +G LTY G Y 
Sbjct: 17 FIAIVENHFNLSQAAELLWSQETLSMMINDFE^ENVKLFKRKRGRIIGLTYLGDNYYK 76 



WO 02/34771 PCT/GB01/04789 



Query: 68 YLKEIDMIERQMAKELYLIRSDKKGEITLGI.NISGIiASSILAKnDPKFNLEHPEISVKLLE 127 

+++ + M +L4- KG I +GI + S + + V+PK LE+P I + E 

Sbjct: 77 DAQKVLSLYDDMFLKLHDHSKGLKGSINIGIPPLILSWFSEWPKLILENPGIQFNVKE 136 

Query: 128 NNQNISEQLVASGDIDLAVGMAPILYKDGIAST-TIYRDELFLMIPTTSQLYNAEKRGQI 186 

+ + G++D+AV ++P D+TIREL + + +LAK+ + 
Sbjct: 13 7 IGAYQLKNELLVGNVDVAVLLSPTGIADNLVETYEIQRSELSVCLSPRHRL- -ASKK- -V 192 

Query: 187 IPFEYPISVLDNEPLILTPLEYGIGICTIAQFYELHHMSLNQMITTST 233 

I+E L+ELL ++ ++EH+N ++T+S+ 

Sbjct: 193 IQWE DLTDEQLALFDPSFMVHHLVLEACERHQVRPNIILTSSS 235 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 16 

A DNA sequence (GBSx0013) was identified in S.agalactiae <SEQ ID 41> which encodes the amino acid 
sequence <SEQ ID 42>. This protein is predicted to be aminoacylase (cpsA). Analysis of this protein 
sequence reveals the following: 

Possible site: 43 

10 N- terminal signal sequence 

i 385 - 401 ( 385 - 401) 



25 Final Results 

bacterial membrane Certainty=0. 1298 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF3S227 GB:AF168363 aminoacylase [Lactococcus lactis] 
Identities = 201/395 (50%) , Positives = 274/395 (68%) , Gaps = 5/395 (1%) 

LRHQLFEKLDQKCDQMVAIRRYLHENPELSFKETKTAAYISDFYKGKDCHVQTQFGGMNG 65 
L + L L Q ++M+ IRR+LH+ PE+SF+E +T YI FYK DC + G G 

-G 61 



35 


Sbjct: 


S 
3 




Query: 


66 


40 


Sbjct: 


62 






126 


45 


Sbjct: 


118 






186 




Sbj Ct: 


178 


50 


Query: 


246 




Sbjct: 


238 


55 


Sbjct: 


306 
298 






366 


60 


Sbjct: 


358 



++VDI G K+ 



G +RI+HQPAEEV PGGAK+MI+AG LDG+D ++G+HVM+T++ G i 



++G GGH SMP +ND IVAAS FV QT++SRR++PFD 



3 N I+D V L+GDVR+M E TR V+ ++ K+I G+ T+GV +DY H 



YPHH P F++ E S++++AK++AT 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



vaccines or 



Example 17 

A DNA sequence (GBSxOOM) was identified in S.agalactiae <SEQ ID 43> which 
sequence <SEQ ID 44>. This protein is predicted to be drug transporter. Analysis 
reveals the following: 

Lipop: Possible site: -1 Crend: 8 

McG: Discrim Score: 6.19 

GvH: Signal Score (-7.5): -0.899999 

Possible site: 31 
»> Seems to have a cleavable 
ALOM program count: 11 value: 
INTEGRAL Likelihood =-: 
Likelihood = • 
Likelihood = • 
Likelihood = 
Likelihood = ■ 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = • 
Likelihood = ■ 
Likelihood = ■ 
Likelihood = 
modified ALOM score: 2.93 



the amino acid 
protein sequence 



INTEGRAL 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



term signal seg. 

-12.15 threshold: 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



401 



* Reasoning Step: 3 

--- Final Results 

bacterial rt 
bacterial outside - 
bacterial cytoplasm - 



■ Certainty=0. 5861 (Affirmative) • 
• Certainty=0. 0000 (Not Clear) < i 

■ Certainty=0. 0000 (Not Clear) < i 



35 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB02058 GB:Z79702 hypothetical protein Rv2333c (Mycobacterium tuberculosis] 
Identities = 118/405 (29%) , Positives = 199/405 (49%) , Gaps = 9/405 (2%) 





13 


KLLVGIVLAVLSFWLFAQS-ILNMG-PDVQSSLGISSGAMDIGVSSTALFSGLFIWTGG 


70 






+LL I + F +F + I+N+ PD+Q S + + V+S +L +FI + 




Sbjct: 


5 


QLLTLIATGLGLFMIFLDALIVNVALPDIQRSFAVGEDGLQWWASYSLGMAVFIMSAAT 


64 




71 


LADKLGRVKFTFIGLCLNIIGSLLIVl^ANGAVLFIMGRIFQGLAAAFIMPSTMALVKTYY 


130 






LAD GR ++ IG+ L +GS+ LA + R QGL AA + +++ALV + 




Sbj ct: 


65 


LADLDGRRRWYLIGVSLFTLGSIACGLAPSIAVLTTARGAQGLGAAAVSVTSLALVSAAF 


124 




131 


-DGKDRQRAVSFWSIGSWGGSGLCSYFGGAVASTLGWRYVFIFSI-IASWSFLLILGTP 


188 






+ K++ RA+ W+ + G+ GG + GWR +F ++ +■ ++V FL + 




Sbjct: 


125 


PFJU<EKARAIGIWTAIASIGTTTGPTLGGLLVDQWGWRSIFYVNLPMGALVLFLTLCYVE 


184 


Query: 


189 


ESKNVGQKTHFDYLGLI I FI I SMLSLNIGI SMAQEHGLMNVI PLSLFTVMLIGFVLFYYV 


248 






ES N + FD G ++FI+++ +L + + G +V + + +G LF ++ 




Sbjct: 


185 


ESCN-ERARRFDLSGQLLFIVAVGALWAVIEGPQIGOTSVQTIVMLWTAAVGCALFVWL 


243 




249 


ETRKSNSFIDFHLFENRFY-LGATISNFLIjNAVAGTLIVINTYMQQGRQLTPKVAGEMSL 


307 






E R SN +D LF + Y L + AV G L++ ++Q R TP V G M L 




Sbjct: 


244 


ERRSSNPMICILTLFRDTSYALAIATICTVFFAVYGMLLLTTQFLQNVRGYTPSVTGLMIL 


303 


Query: 


308 


GYLVC^IAIRVGEKILQRFGARKPMLLGAMSTFVGIFIMTLVNIQGPLYLVLVFVGYAL 


367 
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Sbjct: 304 PFSAAVAIVSPLVGHLVGRIGARVPILAGLCMLMLGLLMLIFSEHRSS ALVLVGLGL 360 

Query: 368 FGTGLGIYATPSTDTAISSIPNEKVGSASGIYKMASSLGGAIGVA 412 

G+G+ + TP T A++++P E+ G ASGI ++G IG A 

Sbjct: 361 CGSGVALCLTPITTVAMTAVPAERAGMASGIMSAQRAIGSTIGFA 405 



A related DNA sequence was identified in S.pyogenes <SEQ ID 45> which encodes the i 
sequence <SEQ ID 46>. Analysis of this protein sequence reveals the following: 

Poscible site: SI 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



have an uncleavable 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm • 



169 - 185 



Transmembrane 



429 - 445 
203 - 219 



227 - 243 



Transmembrane 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



136 - 152 

302 - 318 

261 - 277 

331 - 347 



351 - 367 



165 - 189! 



423 - 450 
200 - 222 



225 - 245 



135 - 155 

299 - 319] 

261 - 277; 

331 - 347 



351 - 368! 



-- Certainty= 0.4 3 12 (Affirmative) . 
-- Certainty=0. 0000 (Not Clear) < i 
-- Certainty=0 . 0000 (Not Clear) < f 



The protein has homology with the following sequences in the databases: 



Sbjct: 



MSHHQQTVSKQTIMAI IAIALIGFSGI LSETSMNVTFPTLMSVYQLPLNSLQWMTTIYLL 6 0 
M Q VS +AI+ +A + F G+L ETSMNVTFPTLM + + LN +QW+TT YLL 
MQKDNQPVSLHVKIAILGLAGIAFCG^IETSMNVTFPTLMQQFSISLNKVQWLTTAYLL 60 



Query: 61 AVAI MMTTSATLKKNVRERPLFFMATGLFT FGT 1 LAVLTQSFAIMLLARI FQGIGTGLVM 120 

VA ++ +A ++K + +FF A LF G I + L +F I+L+ R+ Q + TGL + 

Sbjct: 61 LVAATI SIAAFIEKRFI FKKI FFWAGLLFI IGVI CSALAPNFLILLIGRLIQALSTGLAI 120 

Query: 121 PQMFNIILERVPMHKVGLFMGFAGLIISLAPAFGPTYGGFMISHFSWQWIFICILPVPLI 180 

P + I++++P K G +M ++ P+ GPTYGG + SW+ IF +LP+ LI 

Sbjct: 121 PLLITEIMQQIPQKKQGSYMELVEWLLLWQPSLGPTYGGVITQDLSWRLIFWFVLPIGLI 180 

Query: 181 AGILAYYYLEDSPVSEKVPFDWLAFIALSISLTSALLAITSLE-NGSVNLYYLGLFILSF 239 

A ++ ++E K+PF W FI+L ++L S +A+ + G ++ + G +++ 

Sbjct: 181 AWLIGLSFIEQKSSPSKIPFAWKQFISLILALLSITVAVNNAGIYGWTSIKFYGFLLIAV 240 

Query: 240 IL- --FLYKNLTAKQPFLDIRILKIPSLTFGLIPFFVFQLINLGINFLTPNFIVMEKIAN 296 

IL F+ + ++Q + I I K L+ +F+ Q I L + FL PN+ + 

Sbjct: 241 ILLIVFIKLSTNSRQALISISIFKKWEFVCPLLIYFLIQFIQLSLTFLLPNYAQLILKK3 300 

Query: 297 SSQAGMVLLPGTLLGALLAPAFGKLYDQKGARLSLYLGNALFSLSLIIMTLQTRHFMLLP 356 

+G++LL G+L+ A+L P G++ D ++ L +G S I T+ R+ + 

Sbjct: 301 VMISGIMLLCGSLISAILQPLTGRMLDSFSVKIPLVIGAFFLITSTISFTIFQRYLSVFL 360 



Query: 357 
Sbjct: 

Query: 416 SQAEFTSGVQSVYLLFTIFALLDFI 440 

T G QS Y +L+FI 
Sbjct: 421 GIG--TDGKQSNYTGSRHIFILNFI 443 



WO 02/34771 PCT/GB01/04789 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 91/369 (24%) , Positives = 160/369 (42%) , Gaps = 14/369 (3%) 

Query: 82 FIGLCLNIIGSLLIVLANGAVLFIMGRIFQGIjAAAFIMPSTMALVKTYYDGKDRQRAVSF 141 

F+ L G++L VL + ++ RIFQG+ +MP ++ + F 

Sbjct: 83 FMATGLFTFGTILAVLTQSFAIMLLARIFQGIGTGLVMPQMFNIILERVPMHKVGLFMGF 142 

Query: 142 WSIGSWGGSGLCSYFGGAVASTLGWRYVFIFSIIASWSFLLILGTPESKNVGQKTHFDY 201 

+ +GG + S W+++FI + +++ +L E V +K FD+ 

Sbjct: 143 AGLIISIAPAFGPTYGGFMISHFSWQWIFICILPVPLIAGILAYYYLEDSPVSEKVPFDW 202 

Query: 202 LGLIIFIISMLSLNIGISMAQEHGLMNVIPLSLFTVMLIGFVLFYYVETRKSNSFIDFHL 261 

LI IS+ S + 1+ + E+G +K+ L LF ++ F+LF Y F+D + 

Sbjct: 203 LAFIALS ISLTSALLAIT - SLENGSVNLYYLGLF ILSFILFLYKNLTAKQPFLDIRI 258 

Query: 262 FFJTOFYLGATISNFLLl^V-AGTLIVINTYMQQGRQLTPK^AGEMSL-GYLVCVLIAIRV 319 

+ I F+ + G + ++ + AG + L G L+ L+A 

Sbjct: 259 LKIPSLTFGLIPFFVFQLINLGINFLTPNF:Vy.EKIANSSQAGMVLLPGTLLGALLAPAF 318 

Query: 320 GEKILQRFGARKPMLLGAMSTFVGIFLMTLvNIQGPLYLVLVF-VGYALFGTGLGIYATP 378 

G K+ + GAR + LG + + +MTL Q +++L F + Y LF G + 

Sbjct: 319 G-KLYDQKGARLSLYLGNALFSLSLIIMTL---QTRHFMLLPFTLLYILFTFGRNMGFNN 374 

Query: 379 STDTAISSIPNEKVGSASGIYKMASSLGGAIGVATSIAIYHAFSGNADFHKAALCGLILN 438 

S TAI +P EK A+ I++M GA+G A + I ++ A+F +L 
Sbjct: 375 SIATAIRELPAEKNADATAIFQMMQQFAGALGTAMASLIANS QAEFTSGVQSVYLLF 431 

Query: 439 LVFCSLSIL 447 

+F L + 
Sbjct: 432 TIFALLDFI 440 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 18 

A DNA sequence (GBSx0015) was identified in S.agalactiae <SEQ ID 47> which encodes the amino acid 
sequence <SEQ ID 48>. This protein is predicted to be transposase. Analysis of this protein sequence 
reveals the following: 

Possible site: 45 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3 116 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 19 

A DNA sequence (GBSx0016) was identified in S.agalactiae <SEQ ID 49> which encodes the amino acid 

sequence <SEQ ID 50>. This protein is predicted to be Lll protein (rplK). Analysis of this protein 

sequence reveals the following: 

5 Possible site: 21 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 1859 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MAKKVEKLVKLQIPAGKATPAPPVGPALGQAGINIMGFTKEFNARTADQAGMIIPWISV 60 

MAKKVEK+VKLQI PAGKA PAPPVGPALGQAG+NIMGF KEFNART +QAG4IIPV ISV 
Sbjct: 1 MAKKVEKWKLQIPAGECANPAPPVGPALGQAGVNIKGFCKEFNARTQEQAGLIIPVEISV 60 

Query: 61 YEDKSFDFITKTPPAAVLLKKAAGVEKGSGEP1SKTKVATITRAQVQEIAETKMPDLNAAN 120 

YED+SF FITKTPPA VLLKKAAGVEKGSGEPNK KVAT+T+ QV+EIA+TKMPDLNAA+ 
Sbjct: 61 YEDRSFTFITKTPPAPVLLKKAAGVEKGSGEPNKMKVATvTKDQVREIAQTKMPDIiNAAD 120 

Query: 121 LESAMRMIEGTARSMGFTV 139 

E+AMR+IEGTARSMG TV 
Sbjct: 121 EEAAMRI IEGTARSMGITV 139 

A related DNA sequence was identified in S.pyogenes <SEQ ID 51> which encodes the amino acid 
sequence <SEQ ID 52>. Analysis of this protein sequence reveals the following: 
Possible site: 45 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 427 6 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 136/141 (96%) , Positives = 139/141 (98%) 

Query: 1 MAKKVEKLVKLQIPAGKATPAPPVGPALGQAGINIMGFTKEFNARTADQAGMIIPWISV 60 

MAKKVEKLVKLQIPAGKATPAPPVGPALGQAGINIMGFTKEFNARTADQAGMIIPWISV 
Sbjct: 25 MAKKVEKLVKLQIPAGKATPAPPVGPALGQAGINIMGFTKEFNARTADQAGMIIPWISV 84 

Query: 61 raDKSFDFITKTPPAAVLLKKAAGVEKGSGEPNKTKVATITRAQVQEIAETKMPDLNAAN 120 

YEDKS FDFI TKTPPAAVLLKKAAGVEKGSG PN TKVAT+TRAQVQEIAETKMPDLNAAN 
Sbjct: 85 YEDKSFDFITKTPPAAVLLKKAAGVEKGSGTFNTTKVATVTRAQVQEIAETKMPDLNAAN 144 

Query: 121 LESAMRMIEGTARSMGFTVTD 141 

+E+AMRMIEGTARSMGFTVTD 
Sbjct: 145 IEAAMRMIEGTARSMGFTVTD 165 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 20 

A DNA sequence (GBSx0017) was identified in S.agalactiae <SEQ ID 53> which encodes the amino acid 
sequence <SEQ ID 54>. This protein is predicted to be ribosomal protein LI (rplA). Analysis of this protein 
sequence reveals the following: 
Possible site: 30 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2285 (Af f irmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 R^KKSKTOLiRAALEKIDSTKAYSVEEA.VA]AKETNFAKFDATVEVSYNLNIDVKKADQQIR 60 

MAKK K A 4- +D 4 KAY V EAVAL K+TN AKFDATVEV4-+ L +D K QQIR 
Sbjct: 1 MAKKGKKYVEAAKLVDHSKAYDVSEAVA^ 60 

Query: 61 GAMVLPAGTGKTSRVLVFARGAKAEEAKAAGADFVGEDDLVAKIQGGWLDFDWIATPDM 120 

GA+VLP GTGKT RVLVFA+G KA+EA+AAGADFVG+ D + KIQ GW DFDV++ATPDM 
Sbjct: 61 GAWLPNGTGKTQRVLVFAKGEKAKEAEAAGADFVGDTDYINKIQQGWFDFDVIVATPDM 120 

M VG+4-GRVLGP+ LMPNPKTGTVT +V KA+ E K GK+ YR DKAGN+ IGKVSF 
Sbjct: 121 MGEVGKIGRVLGPKGLMPNPKTGTVTFEVEKAIGEIKAGKVEYRVDKAGNIHVPIGKVSF 180 

Query: 181 DDAKLVDNFKAFNDVIVKAKPATAKGTYITNLSITTTQGVGIKVDPNS 228 

+D KLV+NF D I+KAKPA AKG Y+ N+++T+T G G+KVD ++ 
Sbjct: 181 EDEKLVENFTTMYDT I LKAKPAAAKGVYVKNVAVTSTMGPGVKVDS ST 228 

A related DNA sequence was identified in S.pyogenes <SEQ ID 55> which encodes the amino acid 
sequence <SEQ ID 56>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2309 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 208/229 (90%) , Positives = 220/229 (95%) 

Query: 1 MAKKSKNLRAALEKIDSTKAYSVEEAVALAKJ5TNFAKFDATVEVSYNLNIDVKKADQQIR 60 

MAKKSK +RAALEK+DSTKAYSVEEAVAL KETNFAKFDA+VEV+YNLNIDV+KADQQIR 
Sbjct: 1 MAKKSKQMRAALEKVDSTKAYSVEEAVALVKETNFAKI^ 60 

Query: 61 GA1WLPAGTGKTSRVLVFARGAKAEEAKAAGADFVGEDDLVAKIQGGWLDFDWIATPDM 120 

GAMVLP GTGKT RVLVFARGAKAEEAKRAGADFVGEDDLVAKI GGWLDFDWIATPDM 
Sbjct: 61 GAMVLPNGTGKTQRV1VFARGAKAEEAKAAGADFVGEDDLVAKINGGWLDFDWIATPDM 120 

Query: 121 MALVGRLGRV1GPRNLMPNPKTGTOTMDVAKAWESKGGKITYRADKAGNVQALIGKVSF 180 

MA+VGRLGRVLGPRNLMPNPKTGTVTMDVAKA\rEESKGGKITYRADI<AGOTQALIGKVSF 
Sbjct: 121 MAIVGRLGRVLGPRNLMPNPKTGTVTMDVAKAvEESKGGKJTYRADKAGNVQALIGKVSF 180 



60 Query: 181 DDAKLVDNFKAFNDVIVKAKPATAKGTYITNLSITTTQGVGIKVDPNSL 229 

D KLV+NFKAF+DV+ KAKPATAKGTY+ N+SIT+TQGVGIKVDPNSL 
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Sbjct: 181 DADKLVENFKAFHDVMAKAKPATAKGTYMANVSITSTQGVGIICTDPNSL 229 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 21 

A DNA sequence (GBSx0018) was identified in S.agalactiae <SEQ ID 57> which encodes the amino acid 
sequence <SEQ ID 58>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) c suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10029> which encodes amino acid sequence <SEQ ID 
1003O was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04286 GB:AP001509 nickel transport system (nickel-binding 
protein) [Bacillus halodurans] 
Identities = 209/541 (36%) , Positives = 324/541 (59%) , Gaps = 14/541 (2%) 

RRNILLSITCLLMVTLTACHSQDS KSHKLNSDK-LTLAWGEDFGDVNPHRYNPDQF 59 

R+ ILL + L+ L C +S + N++K +T +W D G +NPH YNP Q 

RKLILLFVISLISSILVGOffiSESGWSNEGEENTEKSITFSWPRDIGPMNPHVXNPSQL 65 

VIQDMVYEGLTOYGDNGKIEPAIjAKSWSISQDGKTyTFKLRNA-KYSDGSNFNAANVKRN 118 
Q M+YE LV Y + G+++P LA SW+IS+DGK YTFKLR ++SDG+ FNA VK+N 





5 


Sbjct: 


6 


Query: 


60 


Sbjct: 


66 




119 


Sbjct: 


126 




179 


Sbjct: 


184 


Query: 


239 


Sbjct: 


243 




299 


Sbjct: 


303 


Query: 


359 


Sbjct: 


363 




418 


Sbjct: 


423 




478 


Sbjct: 


480 



I- TF++ LK+ Y L DL+++RP+RFL ++ 



FP DT++ +K+PIGTG W++ KQ4EY F RN NYWG+ PK+ +VTVK+IPDA+ 



TR IAFESG++DLI+G G+I +D F Q + 4Y T +S+P+ TR L 



VR A4+H +K ++ + G E+ AD I S + ++D + P Y+V4-+AN LD+AGW 



K VREK+G+ L L L Y T 



Y +L L +++V++P++Y 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 59> which encodes the amino acid 
sequence <SEQ ID 60>. Analysis of this protein sequence reveals the following: 

Possible site: 24 
»> May be a lipoprotein 



- Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) • 

bacterial outside Certainty=0 . 0000 (Not Clear) . 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) . 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 131/497 (26%), Positives = 220/497 (43%), ( 



Query: 

Sbjct 

Query: 

Sbj. 

Query: 

Sbj < 

Query; 

Sbjct; 

Query: 

Sbjct: 

Sbj ct 

Sbjct 
Query: 
Sbjct 
Query: 
Sbjct 



8 ILLSITCLLMVTLTACHSQDSKSHKLN SDKLTLAWGEDFGDVNPHRYNP-DQFVI 61 

I L +T L++V AC Q ++ + D+L ++ G PH ++P D++ f 

13 ITLFLTGLILV---ACQQQKPQTKERQRKQRPKDELWSMGAKL PHEFDPKDRYGV 65 



17B SAFPKG - - DDTTKKNVKKP I GTGQWWKS KKQNE YI TFKRNENYWGKKPKLKEVTVKVI P 235 

PK +D K N PIG+G ++VK K E P EH + GKKP K+ T V+ 
174 - - VPKKHYNDKYKSN PIGSGPYMVKEYKAGEQAI FVRNPYWHGKKPYFKKWT-WVLL 227 



236 DAQTRALAFESGDVDLIYGNGIIGLDTFAQYTK DKKYVTAISQPMSTRLLLLNAKE 291 

D T A ESGDVD+IY + D + T+ V +S P +++++ + 

228 DENTALAALESGDVDMIYATPELA-DKKVKGTRLLDIPSNDVRGLSLPYVKKGVITDSPD 286 

292 SIFQDKKWQAMNHAIDKVSIAKNTFRGTEKPADTIFSKSTSHSDAKLNPYSYN 345 

+ D +R+A+ +++ + G KPA +1 K T + K 

287 GYPVGNDVTSDPAIRKALTIGLNRQKVLDTVIiNGYGKPAYSIIDK-TPFWNPKTAIKDNK 345 

346 VDKANQLLDQAGWKMGKDKVREKDGKTLTLRLPYIATKATDKDLVTYFQGEWRKIGINVS 405 

V KA QLL +AGWK D R+K L Y +L + + +GI + 

346 VAKAKQLLTKAGWKEQADGSRKKGDLDAAFDLYYPTIJDQLRANLAVEVAEQAKALGITIK 405 

406 LIAMEEDDYWANAKKGNFDMMLTYSWGAPKD?HA™SALTAKADHGHPENIALENLATKT 455 

LA W +DLY+G + S+AGNINTT 

406 LKASN WDEMATKSHDSALLYAGGRHEAQQFYESHHPSLAGKGW-TNITFYNNPTVT 460 

466 E-MDRLIKSALVDPKEE 481 

+ +D+ + S+ +D E 
461 KYLDKAMTS SDLDKANE 477 



A related GBS gene <SEQ ID 8469> and protein <SEQ ID 8470> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: 22 Crend: 5 
McG: Discrim Score: 7.69 
GvH: Signal Score (-7.5): -3.34 

Possible site: 25 
»> May be a lipoprotein 

ALOM program count: 0 value: 7.21 threshold: 0.0 
PERIPHERAL Likelihood = 7.21 273 
modified ALOM score: -1.94 

*** Reasoning Step: 3 
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■ Final Results - 

bacterial r 
bacterial outside - 
bacterial cytoplasm - 



•- Certainty=0.0000 (Not Clear) 
- Certainty=0. 0000 (Not Clear) 
-- Certainty=0. 0000 (Not Clear) 



The protein has homology with the following sequences in the databases: 



EGAD|8250| nickel -binding periplasmic protein precursor 
OMNI |NT01EC4139 oligopeptide transporter putative substrate binding 
domain, putative Insert characterized 
SP|P33590|NIKA_ECOLI NICKEL-BINDING PERIPLASMIC PROTEIN PRECURSOR. 
GP| 404845 |emb|CAA51659.l| |X73143 NikA Insert characterized 
GP| 466612 | gb|AAB18451. 1 | [U00039 nikA Insert characterized 
GP|l789887|gb|AAC76501.l| |AE000423 periplasmic binding protein 
characterized 

PIR|S39594 |S39594 nickel -binding periplasmic protein precursor 
charact er i z ed 



Edit characterized 



for nickel Insert 



Escheri Insert 



ORF02080(391 - 1905 of 2223) 

20 EGAD| 8250|EC3476 (21 - 520 of 524) nickel -binding periplasmic protein precursor {Escherichia 

coli}OMNl|NT01EC4139 oligopeptide transporter putative substrate binding domain, 
putativeSP|P33 590|NIKA_ECOLI NICKEL-BINDING PERIPLASMIC PROTEIN 

PRECURSOR. GP | 404845 | emb| CAA51659 . 1 | |X73143 NikA {Escherichia 

coli}GP|466612|gb|AAB18451.l| |U00039 nikA {Escherichia 

25 coli)GP|l789887|gb|AAC76501.l| |AE000423 periplasmic binding protein for nickel {Escherichia 

coli}PIR|S39594|S39594 nickel -binding periplasmic protein precursor - Escheri 
%Match =26.9 

%Identity = 41.3 %Similarity =63.7 

Matches =208 Mismatches = 175 Conservative Sub.s = 113 

30 

147 177 207 237 267 297 327 357 

SP*IIDTYTLSQSWSHNFLLRRMQNQYNVGNTSSVDYHKIiXX*LIXXXCLKK*LTKLKRKLVKMRRNIL 



LTACHSQDSKSHKI^SDKLTLAWGEDFGDWPHRYNPDQF\fIQDI4VYEGLWYGDNGKIEPAIAKSWSISQDGKTYTFKL 
I |»:| II : I :||| I 1 = 1 I Mil Ihl :| = I 11111= 1=1111=11 I 
FALIACASFIVHAAAPDEITTAWPVNVGPLNPHLYTPNQMFAQSMVYEPLVKYQADGSVIPWIiAKSWTHSEDGKTWTFTL 



624 654 684 714 744 774 304 834 

RN-AKYSDGSNFNAANVKPJSIFDSIFSKSNRGNHNWFNLTNQLENYRAI^NQSTFEIKLKQAYSATLYDLSMIRPIRFLSDS 
I: |:|:| 1 = 1 II = = = II Ihl 11= = =11 = - = = l II II I = l = = II lh= I 

45 RDDVKFSNGEPFDAEAAAENFRAVL- -DNRQRHAWLELANQIVDVKALSKTELQITLKSAYYPFLQELALPRPFRFIAPS 

100 110 120 130 140 150 160 



864 894 924 954 984 1014 1044 1071 

AFPKGDDTTKKWKKPIGTGQWWKSKKQNEYITFKRNE^IYWGKKPKLKEvTVKVIPDAQTRA]^FESGDVDLIYGN-GI 
50 | : | :| HIM |: = : | |:| I 1)11111 = 11 ■ \ ■ ■ \ III I I I I = I I I = I I = I I = I I I I = 

QF- - KNHETMNGIKAPIGTGPWILQESKLNQYDVFVRNSNYWGEKPAI KKITFNVI PDPTTRAVAFETGDIDLLYGNEGL 
180 190 200 210 220 230 240 



1101 1131 1161 1191 1221 1251 1281 1311 

55 IGLDTFAQYTKDKKYVTAISQPMSTRLLLMAKESIFQDKiC\7RQAKNHAIDKVSIAKNTFRGTEKPADT 

; HIM::::: | | :|||: | :| || :: : ||:|:|:|::| |: I = |l = = 111 = 1= I = = = 
LPLDTFARFSQNPAYHTQLSQPIETVMLALNTAKAPTNEIAVREALNYAVNKK^ 

260 270 280 290 300 310 320 

60 1341 1371 1395 1425 1455 1485 1515 1545 
KLNPYSYNVDKANQLLDQAGWKM--GKDKVREKDGKTLTLRLPYIATKA? 

111=11 ll = = lll = III :|||:h I = I =1 I I I = I = 1 = 11 =1111 II = 
GLKPSQYDPQKAKALLEKAGWTLPAGKD-IREl^NGQPLRIELSFIGTDALSKSMAEIIQADMRQIGADVSLIGEEESSIY 

340 350 360 370 380 390 400 

65 

1575 1605 1635 1665 1695 1725 1755 1785 



ANAKKGNFDMMLTYSWGAPTOPHATOSALTAKADHGHPENIALENLATKT 
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I = I I | = : :||||:||||r.-| = = I = h II I =h I I I h =1 II 

ARQRDGRFGMI FHRTWGAPYDPHAFLSSM - - - RVPS HAD FQAQQGLADKPL Z DKE I GEVLATHDETQRQALYRDILTRLH 



1815 1845 1875 1905 1935 1965 1995 2025 

DEAVYIPLTYQSVISVYRKGDFKTMRFAPEENSFPLRYIEKNNVSK*FDHQKNIVSFFGIVFHITSNIYSYQTINS*FSR 
11111 = 1-1 l = = II::: =11 h 1 = 

DEAVYLPISYISMMW-SKPELGNIPYAPIATEIPFEQIKPVKP 



There is also homology to SEQ ID 318. An alignment of the GAS and GBS sequences follows: 

Identities = 44/185 (23%) , Positives = 78/186 (41%) , Gaps = 27/186 (14%) 

Query: 65 VITQMV-DGLLENDEYGNLVPSLAKDWKVSKDGLTYTYTLRDGVSWYTADGEEYAPVTAE 123 

VI MV +GL+ + G 4- P+LAK W +S+DG TYT+ LR+ +DG + + 

Sbjct: 57 VIQDMVYEGLVRYGDNGKIEPALAKSWSISQDGKTYTFKLRNA KYSDGSNFNAANVK 113 

Query: 124 DFVTGLKHAVIJDKSDALYvVEDSIK^KAYQNGETOFKEVGvTCALDDKTVQYTLNKPESY 183 

4 + + + + + ++N +AL+ T + L ++Y 

Sbjct: 114 RNFDSIFSKSNRGNHNMFNLTNQLEN YRALNQSTFEIKLK- -QAY 156 

Sbjct: 157 

Query: 240 NYWDAK 245 

NYW K 
Sbjct: 215 NYWGKK 220 

SEQ ID 8470 (GBS186) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 35 (lane 7; MW 60kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 41 (lane 6; MW 85.7kDa). 

GBS186-GST was purified as shown in Figure 202, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 22 

A DNA sequence (GBSx0019) was identified in S.agalactiae <SEQ ID 61> which encodes the amino acid 
sequence <SEQ ID 62>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.95 Transmembrane 101 - 117 ( 99 - 123) 

INTEGRAL Likelihood = -4.73 Transmembrane 276 - 292 ( 275 - 293) 

INTEGRAL Likelihood = -1.12 Transmembrane 232 - 248 ( 232 - 248) 

INTEGRAL Likelihood = -0.96 Transmembrane 151 - 167 ( 150 - 169) 



Final Results 

bacterial membrane Certainty=0. 3378 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04287 GB:AP001509 nickel transport system (permease) 
[Bacillus halodurans] 
Identities = 119/304 (39%) , Positives = 174/304 (57%) 

Query: 5 SSIIKKILSAFLALFFISLLTFILIKLSTVNSAENYLRLSKISVSPEALKEAEHYLGLDK 64 
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Query: 65 PLWKQYWLWFQKALTGDFGYSYVLRLP\TljDLVLQRFIATLFLGTSAFLLIVTISTPIjGVW 1 

P+ QY K DFG+SYV PV D V R ATL L S4 L V IS PLG 

Sbjct: 63 PMAVQYVQTIVKVFQLDFGHSYVTNQPWVDEVTARMPATLQIAVS 



Query: 125 AGLHESARSDHLIRFLSFSSVSMPNHWAYLLKLLFSAKLlttLPVSGGNDLQSLILPSIT 184 

+ +++++ D R LS+ S+P FW+ YLL+ FS KLNL PV G L+LP++T 
Sbjct: 123 SAIYKNSLIDRFSRLLSYLGAS I PQFWLGYLLIFFFSVKLNLFPVEGRGSWAHLVLPTVT 182 

Query: 185 LSFSTVGQYIALIRKAISQENRSLNVENARLRGVKERYIVTHHLLRNALPAIMTALSLTW 244 

LS + + Y L+R ++ +4- + V AR RG+KE+ 1+ H+L+ A+ ++T L + 
Sbjct: 183 LSrALIAIYTRLLRASVLEQMQESYVLYARTRGIKEKVIWKHVLKIAISPVITGLGMNV 242 

LLTG+IIVE++FSW G GR FV ++ D+PVIQ +L+ LF+ N + + 
Sbjct: 243 GKLLTGTIIVEQVFSWPGFGRYFVDAIFNRDIPVIQCYVLLAACLFIVCNLIVDLVQLAM 302 

Query: 305 DPRL 308 
DPR+ 

Sbjct: 303 DPRI 3 06 



A related DNA sequence was identified in S.pyogenes <SEQ ID 63> which encodes the amino acid 
sequence <SEQ ID 64>. Analysis of this protein sequence reveals the following: 

Possible site: 40 
»> Seems to have an uncleavable N-term signal seq 
INTEGRAL Likelihood = -7.27 
INTEGRAL Likelihood = -6.37 
INTEGRAL Likelihood = -5.89 
INTEGRAL Likelihood = -5.26 
INTEGRAL Likelihood = -2.39 

Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



Transmembrane 14 5 - 



■ Certainty=0. 3909 (Affirmative) ■ 
• Certainty=0. 0000 (Not Clear) < ; 

■ Certainty=0. 0000 (Not Clear) < £ 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 102/324 (31%) , Positives = 167/324 (51%) , Gaps = 28/324 (8%) 



IIKKILSAFLALFFISLLTFILIKLSTVN- - -SAENYLRLSKISVSPEALKEAEHYLGLD 63 

II KI+ +F +S+LTF+L+K S V+ ++ NY S++P K H+ GLD 

I IWKI I RCVTL I FGVS VLTFVLLKQSPVDPVMASVNY DTSLTPAQYKAI AHHYGLD 6 3 



Sbjct: 

Query: 64 KPLWKQYWLWFQKALTGDFGYSYVLRLPVLDIjvLQRFLATLFLGTSAFIjLI'VTISTPLGV 123 

KP QY++W + + GD G S V R PV D++ R A+ L +++L I LG 
Sbjct: 64 KPALVQYFIWLKNVIQGDLGTSLVYRQPVSDIIRSRAGASFILMGLSWILSGLIGFILGT 123 

Query: 124 WAGLHESARSDHLIRFLSFSSVSMPNFWVAYLLMLLFSAKLNLLPVSGGNDL 175 

+ H+ D ++R+ S+ +S+P FW+ + +L+FS +L P+ + + 
Sbjct: 124 LSAFHQGKLLDRWRWFSYLQISVPTFWIGLIFLLIFSVQLGWFPIGISSPIGTLSQDIT 183 

Query: 176 QSLILPSITLSFSTOGQYIALIRKAISQENRSLNVENARLRGVKERYIVTHHLLR 230 

+ L+LP TLS + R + S V AR RG + I HH LR 

Sbjct: 184 I^RVKHLMLPVFTLSILGIAWWLHTRTKMMSVLSSEYVLFARARGETQWQIFKHHCLR 243 

Query: 231 NALPAIMTALSLTWVY LLTGS I IVEEI FSWNGIGRLFVTSLRTSDLPVIQACMLI FG 287 

N AI+ A++L + Y L GS++ E++FS+ G+G + SD P4+ A ++I G 

Sbjct: 244 N AIVPAITLHFSYFGELFGGSVIAEQVFSYPGLGSTLTEAGLKSDTPLLLAIVMI-G 299 

Query: 288 TLFL-ANNFMTQCFMNWVDPRLRK 310 

TLF+ A N + + ++P+LR+ 

Sbjct: 300 TLFVFAGNLIADILNSIINPQLRR 323 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines o 



Example 23 

A DNA sequence (GBSxO02O) was identified in S.agalactiae <SEQ ID 65> which encodes the amino acid 
sequence <SEQ ID 66>. This protein is predicted to be nickel transport system (permease). Analysis of this 
protein sequence reveals the following: 

Possible site: 14 



> Seems to have a cleavable N-term signal seq. 



INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 



Transmembrane 
Transmembrane 
Transmembrane 



■ Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



• 189 ( 169 - 

■ 110 ( 86 - 

■ 237 ( 221 - 

• 134 ( 118 - 



— Certainty=0. 4057 (Affirmative) ■ 
-- Certainty=0. 0000 (Not Clear) < i 

— Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04288 GB:AP001509 nickel transport system (permease) 

[Bacillus halodurans] 
. Identities = 103/239 (43%) , Positives = 157/239 (65%) 

Query: 6 AIFAPILSSFDPQYVDLSQKLIAPNNVHLLGTDQLGRDVLSRLLYGARYSLFLAIIISLL 65 

AI AP ++ DP V+L+ KLL P+ + LGTDQLGR LSRLL+GAR SL A +1 + 
Sbjct: 29 AILAPWIAPHDPIQVNLMiKLLPPSWEYPLGTDQLGRCNLSRLLFGARVSLGFATLIFIS 88 

Query: 65 ELTIGMFVGLIVGWYQGKLENLFLWIANIILAFPSFLLSLATVGILGHGLGNLIFAIVFV 125 

L IG+- VG I G+ G ++++ + ++AFP+ +L L VG+ G GL ++ A+V V 

Sbjct: 89 SLGIGLLVGAIAGYRGGWIDSVLMRFCEGVMAFPNLVLVLGLVGLFGPGLWQVVLALVMV 148 

Query: 126 EWVYYAKLMTNLWSAKKEPWINAQIMGLSVWHILRKHIFPFVYQPILWWLMNIGNII 185 

+WVYYA++ 4++ S K+4- ++ A+I G S W I+R+HI P V PI+V+ + +G I 
Sbjct: 149 QWVYYARMFRSMIVSLKEQNFITAARISGSSPWKIIRRHIIPNVLPPIWIGTLEMGWAI 208 

Query: 186 LMISGFSFLGIGVQPNVTEWGMMLHDARGYFRTATVIMMLSPGIAIFLTVFSFNTLGDAI 244 

+ IS SFLG+G+QP EWG M+H+ + + R+ +ML PGI I L V +FN LG+++ 
Sbjct: 209 MDISALSFLGLGIQPPTPEWGAMIHEGKSFIRSHPELKLYPGIMILLWMTFNVLGESL 267 



A related DNA sequence was identified in S.pyogenes <SEQ ID 67> which encodes the amino acid 
sequence <SEQ ID 68>. Analysis of this protein sequence reveals the following: 

Possible site: 39 



Seems to have an uncleavable N-term signal seq 










INTEGRAL 


Likelihood = -7.80 Transmembrane 


182 


198 


180 


204 


INTEGRAL 


Likelihood = -7.38 Transmembrane 


77 


93 


' 69 




INTEGRAL 


Likelihood = -7. OS Transmembrane 




128 




132 


INTEGRAL 


Likelihood = -6.16 Transmembrane 


8 


24 


7 


31 


INTEGRAL 


Likelihood = -5.10 Transmembrane 


239 


255 


235 


258 



- Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



- Certainty=0. 4121 (Affirmative) . 

- Certainty=0. 0000 (Not Clear) < ! 

- Certainty=0. 0000 (Not Clear) < < 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 61/246 (24%), Positives = 127/246 (50%), Gaps = 1/246 (0%) 
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Query: 2 ■ LVISAIFAPILSSFDPQYVDLSQKLLAPM1WHLLGTDQLGRDVLSRLLYC3ARYSLFLAII 61 

L++S + + P + + + LAP+ HL GTD LGRD+ R + G +SL + ++ 

Sbjct: 19 LILSIIALNLYFYRTPLETNAALRNLAPSKSIHLFGTDGLGRDMFWTIKGLYFSLQVGLL 78 

5 

Query: 62 ISLLELTIGMFVGLIVGWYQGKLEMLFLWIANIIIAFPSFLLSLATVGILGHGLGNLIFA 121 

+L+ + + G++ G ++ + W+ ++ + P + + ++G G +1 A 

Sbjct: 79 GALMGVFLATVFGVLAGLGNSLIDKIIAVJLVDLFIGMPHLIFMILISFWGKGAQGVI1A 138 

10 Query: 122 IVFVEWVYYAKLMTNLVKSAKKEPOTINAQIMGI^^ 181 

W A+L4 N V K + +V ++ MG + ++I+R HI P + I + ++ 
Sbjct: 139 TAVTHWPSLARIjlRNEVYDLKMKAFVQLSKSMGKTPYYIVRHHILPIjIASQIFIGFILLF 198 

Query: 182 GNIILMISGFSFLGIGVQPWTEIK^LHDARGYFRTAT-WMMLSPGIAIFLTVFSFNTL 240 
15 ++IL + +FLG G+ G++L +A + W++4 PG+ + h V +F+T+ 

Sbjct: 199 PHVILHFASMTFLGFGLSAEQPSVGIILSEARKHISLGHWWLVIFPGLYLILWNRFDTI 258 

Query: 241 GDAIDK 246 

20 Sbjct: 259 GESLKK 264 



A related GBS gene <SEQ ID 8473> and protein <SEQ ID 8474> were also identified, 
protein sequence reveals the following: 



Lipop: Possible site: -1 Crenel: 0 
McG: Discrim Score: 7.56 
GvH: Signal Score (-7.5): -1.15 

Possible site: 14 
>» Seems to have a cleavable N-term signal seq. 
ALOM program count: 5 value: -7.64 threshold: 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 
Transmembrane 

Transmembrane 
Transmembrane 
145 



- 73 ( 51 - 

- 189 ( 169 - 

- 110 ( 86 - 

- 237 ( 221 - 

- 134 ( 118 - 



* Reasoning Step: 3 



■ Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



— Certainty=0. 4057 (Affirmative) . 
■-- Certainty=0. 0000 (Not Clear) < i 

— Certainty=0. 0000 (Not Clear) < I 



45 The protein has homology with the following sequences in the databases: 

ORF02082(2S2 - 1053 of 1365) 

EGAD | 89511 | HP03 00 (23 - 283 of 285) dipeptide ABC transporter, permease protein (dppC) 
{Helicobacter pylori} OMNl|HP0300 dipeptide ABC transporter, permease protein (dppC) 
GPl2313398|gb|AAD07369.l| |AE000548 dipeptide ABC transporter, permease protein (dppC) 
50 {Helicobacter pylori 26695} PIR|D64557|D64557 dipeptide ABC transporter, permease protein - 

Helicobacter pylori (strain 26695) 
%Match =20.5 

%Identity =43.4 %Similarity =63.3 

Matches = 111 Mismatches = 92 Conservative Sub.s = 51 



30 



60 



90 



120 



150 



180 



210 



240 



P*KCLTCDNDST*LDLGLLINRINYC*RNFFKEI^K^ 



TKVKGEIISKRIYFSSSLLVLLVISAIFAPILSSFDPQYVDLSQKLLAP -NNVHLLGTDQLGRDVLSRLLYGARY 

=::|lll 11111=1= II : =11 I I =11111 1111=1111=1111 

EFIQQFKKNKAAWGAWIVLLLVICAIFAPLLAPHDPYVQNAQORLLKPIKEHGGNAICYLLGTDDLGRDILSRLIYGARI 
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SLFIAIIISLLELTIGMFVGLIVGWYQGKLEmFLWIOTI^ 

|| : |: : : | :||| |:: || : : : | :|::| : || || : | :|| | | ::|| || :|:|: 
SLTIGIVSMGIAVFFGTILGLIAGYFGGKTDA:iMRIMDIKFALPSILLIVIV\rAVLGPSLraAMLAIGFVGIPGFARLV 



TNLWSAKKEPWINAQIMGLSVWHILRKHIFPFVYQPILWrinLM^IGNIILMISGFSFLGIGVQPNVTEWGMN 

= I h: III = = l II ■■ I III l = = l I : =1 : =1111 = 1 II III II = = 

RSSVLGEKEKEYVIASKINGSSHLRLMCKVIFPNCIIPLIVQTTMGFASTVLEAAALSFLGLGAQPPKPEWGAMLMNSMQ 



972 1002 1032 1059 1089 1119 1149 

YFRTATWMMLSPGIAIFLTVFSFNTLGDAI -DKKDWKRQWNS * K* ENCHYR* ERSLY*EI LWK* I WENR * LLLVRW 
I II 11== 11= Mill III =1111 I II 
YIATAPWMLVFPGVMIFLTVMSFNLVGDGIMDALDPKRTS 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 24 

A DNA sequence (GBSx0021) was identified in S.agalactiae <SEQ ID 69> which encodes the amino acid 
sequence <SEQ ID 70>. This protein is predicted to be peptide ABC transporter, ATP-binding protein. 
Analysis of this protein sequence reveals the following: 

Possible site: 60 

ive no N-terminal signal sequence 

161 - 177 ( 161 - 177) 



Final Results 

bacterial membrane Certainty=0. 1128 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10027> which encodes amino acid sequence <SEQ ID 
10028> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF73561 GB:AE002315 peptide ABC transporter, ATP-binding 
protein [Chlamydia muridarum] 
Identities = 86/253 (33%) , Positives = 154/253 (59%) , Gaps = 2/253 (0%) 

METTMEQLE IRKLSLQIGEVPVLRDFSCKIDMGESLTI 1GESGSGKTLLAKLLVGHI PQG 60 
M T+ ++E ++++ ++ SI +SL ++GE+GSGKT ++K ++G +P 

MSKTLLKIENLWAIKESNQRLVNHLSLTIKQRQSLALVGENGSGKTTVSKAILGFLPDN 60 

MITO-GNIFFKGVDLGKLTVKQWQKLRGRDIAYLVQNPMSMFNPFQKIEAHILETILSHE 119 
++ G IF+ G D+ +L+ K++Q +RG+ 1+ + QN M P ++ I+ET+ H 



f YPFELSGGM QR+ +A L+ +P++II DEP 



Query: 


1 


Sbjct: 






61 


Sbjct: 


61 




120 


Sbjct: 


121 




180 


Sbjct: 


181 



+L+++ QNN L+ +TH+ L +L 



239 ILSNPQHNYTKAL 251 
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+h +P H YT+ L 
Sbjct: 241 LLRSPSHPYTQKL 253 

A related DNA sequence was identified in S. pyogenes <SEQ ID 71> which encodes the amino acid 
5 sequence <SEQ ID 72>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -2.50 Transmembrane 168 - 184 ( 167 - 184) 
10 INTEGRAL Likelihood = -1.70 Transmembrane 211 - 227 ( 211 - 227) 

Final Results 

bacterial membrane --- Certainty=0. 1999 (Affirmative) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
15 bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 87/232 (37%) , Positives = 138/232 (58%) , Gaps = 3/232 (1%) 

20 Query: 23 LRDFSCKIDMGESLTIIGESGSGKTLLAKLLVGHIPQ-GMTVRGNIFFKGVDLGKL-TVK 80 

+R+ S ++ GE L +GESGSGK++L K G + G G+I ++G +L L T K 
Sbjct: 28 IRNVSLELVEGEVLAFVGESGSGKSVLTKTFTGMLESNGRIANGSIVYRGQELTDLKTNK 87 

Query: 81 QWQKLRGRDIAYLVQNPMSMFNPFQKIEAHILETILSHEKCSKRVALSKALEWMKRLNLD 140 
25 +W K+RG IA + Q+PM+ +P + I + I E 1+ H+K S A AL++M ++ + 

Sbjct: 83 EWAKIRGSKIATIFQDPMTSLSPIKTIGSQITEVIIXHQKVSHAKAKEMALDYMNKVGIP 147 

Query: 141 DAISLLKKYPFELSGGMLQRIMLATILSLDPQVIILDEPTSAVDCHNCSTISAILQELQN 200 
+A + YPFE SGGM QRI++A L+ P ++I DEPT+A+D + I +L+ LQ 
30 Sbjct: 148 NAKKRFEDYPFEYSGGMRQRIVIAIALACRPDILICDEPTTALDVTIQAQIVELLKSLQR 207 

Query: 201 NGK-TLITVTHDYQLARDLGGQLLVISEGEWEQGQTQAILSNPQHNYTKAL 251 

T+I +THD + + ++ V+ GE+VE G+I +P+H YT +Jj 
Sbjct: 208 EYHFTIIFI THDLG WAS I ADKVAVMYAGE I VEFGTVEE I F YDPRHPYTWS L 259 

35 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 25 

A DNA sequence (GBSx0022) was identified in S.agalactiae <SEQ ID 73> which encodes the amino acid 
40 sequence <SEQ ID 74>. This protein is predicted to be peptide ABC transporter, ATP-binding protein. 
Analysis of this protein sequence reveals the following: 
Possible site: 50 

>» Seems to have an uncleavable N-term signal seq 

45 

Final Results 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

50 

A related GBS nucleic acid sequence <SEQ ID 10025> which encodes amino acid sequence <SEQ ID 
10026> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BABD5797 GB:AP001514 oligopeptide ABC transporter (ATP-binding 
55 protein) [Bacillus halodurans] 

Identities = 82/199 (41%), Positives = 130/199 (65%), Gaps = 2/199 (1%) 
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Query: 19 RQEVliKDCHFHLKRGEIIGIMCSKSGSGKSSLARLIIGLDSPTCGSIYFQG-KIYTPKDGK 77 

+Q++L F + GE +GI+G+SGSGKS+L RL++G++ P G IYF+G K+ 
Sbjct: 21 KQKILNHISFECRHGECLGIIGESGSGKSTLGRLLLGIEKPDRGHIYFEGNKVEERSVRS BO 

Query: 78 AQIILWQDALSSVNPYFSIEEILNFAFYGKKTT-FELCQILEAVGLDGTYLKYKARQLS 136 

I VFQD SS+NP+F++E + E GKK ++ 4L+ VGL +Y K +LS 
Sbjct: 81 GNISAVFQDYTSSINPFFTVETAIMEPLKGKKAAKSKVDYLLKQVGLHPSYKKKYPHELS 140 

Query: 137 GGQLQRVCIARALLLKPKIIIFDESLSGLDPVTQIKMLRLLQKIKRRYELSFIMISHDPK 196 

GG++QRVCIARA+ 4PK 1+ DE++S LD Q ++L LL ++KR Y++S++ I+HD + 
Sbjct: 141 GGEVQRVCIARAI STEPKCI VLDEAI SSLDVS I QTQVLDLLIELKRIYQMSYLFITHDIQ 200 

Query: 197 ICQAICNRVFLIKNGYLVE 215 

IC+R+ 4- ++G + E 
Sbjct: 201 AAAYICDRIMIFRHGQIEE 219 

A related DNA sequence was identified in S.pyogenes <SEQ ID 75> which encodes the amino acid 
sequence <SEQ ID 76>. Analysis of this protein sequence reveals the following: 
Possible site: 60 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3195 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certair.ty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

= 91/238 (38%), Positives = 137/238 (57%), Gaps = 21/238 (8%) 

MKEI FLMLVCNHVGKTFGRQ EVLKDCHFHLKRGEIIGIMGKSGSGKSSLARLIIGL 56 

ME + L +H+ TF ++ E +KD H+ +G+I GI+G SG+GKS+L R+I L 
MNEA1 IQL- -DHIDITFRQKKRVIFAVKDVTVHINQGDIYGIVGYSGAGKSTLVRVINLL 58 

DSPTCGSI YFQGKIYTPKDGKAQ IILVFQ--DALSSVNPYFSIEEILNE 103 

+PT G I + QGKI D Q I ++FQ + ++ ++ L 

QAPTNGKITVDGD VTFDQGKIQLSADALRQKRRDIGMI FQHFNLMAQKTAKENVAFALRH 118 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 26 

A DNA sequence (GBSx0023) was identified in S.agalactiae <SEQ ID 77> which encodes the amino acid 
sequence <SEQ ID 78>. This protein is predicted to be UMP kinase (pyrH). Analysis of this protein 
sequence reveals the following: 

Possible site: 18 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1935 (Affirmative) < suco 



Identities 


Query: 


1 


Sbjct: 


1 


Query: 


57 


Sb j ct : 


59 




104 


Sbjct: 








Sb j ct : 


178 
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The protein has homology with the following sequences in the GENPEPT database: 

s subtilis] 



EPKYQRILIKLSGEALAGDKGVGIDIPTVQSIAKEIAEVHHSGVQIALVIGGGNLWRGEP 61 
+PKY+RI++KLSGEALAG++G GI+ +QSIAK++ E+ V++A+V+GGGN + 
KPKTKRIVLKLSGEALAGEQGNGINPTVIQSIAKQVKEIAELEVEVAVWGGGNYGAEKT 62 



LEK R+V+F AG G+PYFSTDTTAALRAAEIEA+ ILMAKN VDGVYNADP+KD +AVK+ 



■ L++++V+K GL++MD+TAS++ MDNDI L+VF++ E GNIKR V+GE IGT V K 
JSLSYLDVLKDGLEVMDSTASSLCMDNDIPLIVFSIMEEGNIKRAVIGESIGTIVRGK 240 

A related DNA sequence was identified in S.pyogenes <SEQ ID 79> which encodes the amino acid 
sequence <SEQ ID 80>. Analysis of this protein sequence reveals the following: 
Possible site: 18 



Query: 


2 


Sbjct: 


3 




62 


Sb j ct : 


63 


Query: 


122 


Sbjct: 


123 




182 


Sbjct: 


183 



>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1955 (Affirmative) < suco 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 224/242 (92%) , Positives = 233/242 (95%) 

Query: 1 MEPKYQRILIKLSGEALAGDKGVGIDIPTVQSIAKEIAEVHNSGVQIALVIGGGNLWRGE 60 

+EPKYQRILIKLSGEAIAG+KGVGIDIPTVQ+IAKEIAEVH SGVQIALVIGGGNLWRGE 
Sbjct: 1 VEPKyQRILIKLSGEALAGEKGVGIDIPTVQAIAKEIAEVHVSGVQIALVIGGGNLWRGE 60 

Query: 61 PAAEAGMDRVQADYTGMLGTVMNALVNIADSLQQYGVDTRVQTAIPMQTVAEPYWGRALR 120 

PAA+AGMDRVQADYTGMLGTVMNALVMADSLQ YGVDTRVQTAI PMQ VAEPY+RGRALR 
Sbjct: 61 PAADAGMDRVQADYTGMLGTVMNALVMADSLQHYGVDTRVQTAIPMQNVAEPYIRGRALR 120 

HLEKNRIWFGAGIGSPYFSTDTTAALRAREIEA+AIimKNGVDGVYNADPKKDANAVK 
Sbjct: 121 HLEKSKIWFGAGIGSPYFSTDTTAABRAAEIEADAILMAKNGVDGVYNADPKKDANAVK 180 

Query: 181 FDELTHVEVIKRGLKIMDATASTISMDITOIDLVVFISMNETGNIKRVVLGEQIGTTVSNKA 240 

FDELTH EVIKRGLKIMDATAST+SMDNDIDLWFNMNE GNI+RW GE IGTTVSNK 
Sbjct: 181 FDELTHGEVIKRGLKIMDATASTLSMDNDIDLWFNMNEAGNIQRWFGEHIGTTVSNKV 240 

Query: 241 SE 242 

Sbjct: 241 CD 242 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 27 

A DNA sequence (GBSx0024) was identified in S.agalactiae <SEQ ID 81> which encodes the amino acid 
sequence <SEQ ID 82>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3712 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 00D0 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 28 

A DNA sequence (GBSx0025) was identified in S.agalactiae <SEQ ID 83> which encodes the amino acid 
sequence <SEQ ID 84>. This protein is predicted to be ribosome recycling factor (frr). Analysis of this 
protein sequence reveals the following: 
Possible site: 34 

»> Seems to have no N-terminal signal sequence 



25 Final Results 

bacterial cytoplasm Certainty=0. 3522 (Affirmative) < succ; 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database: 



Sbjct: 

Query: 61 EARVLLISPFDKSSlKDIERAINESDLGINPANDGSVIRLVIPALTEETRRDIiAKEVKKV 120 

EAR+L+I PFDKSSI D1ERAI +SDLG+ P+NDG+VIR+ IP LTEE RRDL K VKK 
Sbjct: 61 EARLLVIQPFDKSSISDIERAIQKSDLGLTPSKDGWIRITIPPLTEERRRDLTKLVKKS 120 

Query: 121 GENAKIAIRNIRRDAMDEAKKQEMICE-TEDDLKSLEKDIQKATDDAVKHIDEMTANKEK ISO 

E AK+A+RNIRRDA D+ KK++K+ E+TEDDL+ + +D+QK TD ++ ID+ KEK 
Sbjct: 121 AEEAKVAVRNIRRDANDDLKKRQKDGELTEDDLRRVT3DVQKLTDKYIEQIDQKAEAKEK 180 

Query: 181 ELLEV 185 

Sbjct: 181 EIMEV 185 

A related DNA sequence was identified in S.pyogenes <SEQ ID 85> which encodes the amino acid 
sequence <SEQ ID 86>. Analysis of this protein sequence reveals the following: 

Possible site: 21 



I-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 . 4462 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 160/185 (86%) , Positives = 171/185 (91%) 

MTKEIVTKAQERFEQSHQSLSREFAGIRAGRANASLLDRIQVEYYGaPTPLNQLASITVP 60 
M 1+ A+ERF QSHQSLSRE+A I RAGRANASLLDR I QV+ YYGAPTPLNQLAS1 TVP 
MANAI I ETAKERFAQSHQSLSREYAS IRAGRANASLLDRI QVDYYGAPTPLNQLAS I TVP 60 





1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 




121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 



GENAKIAIRNIRRDAMD+AKKQEK KEITED+LK+LEKD~QKATDDA+K ID MTA KEK 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 29 

A DNA sequence (GBSx0026) was identified in S.agalactiae <SEQ ID 87> which encodes the amino acid 
sequence <SEQ ID 88>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1356 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10023> which encodes amino acid sequence <SEQ ID 
10024> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12943 GB:Z99109 yitL [Bacillus subtilis] 
Identities = 107/269 (39%) , Positives = 155/269 (56%) , Gaps = 6/269 (2%) 

Query: 42 LVTDENKDF-YFIQKDGFTFALSKSEGEHHIGEM--VKGFAYTDMQQKARLTTKETFATR 98 

L D DF YF+ T L SE I + V+ F Y D Q++ T K + 

Sbjct: 25 LSIDHQTDFGYFLTDGEDTILLHNSEMTEDIEDRDEVEVFIYVDQQERLAATMKIPIISA 84 

Query: 99 DHYGWGlVTEWKDLGVFLDTGLPDKQvWSLDVLPELKELWPKKGDRLYVCLDVDKKDR 158 

D YGW V + +D+GVF+D GL K +V+ + LP +++WP+KGD+LY L V + R 
Sbjct: 85 DEYGWVEVTOKVEDMGVFVDVGL-SKDALVATEHLPPYEDWPQKGDKLYCMLKVTNRGR 143 

Query: 159 LWALPADPEVFQRMATPAYNNMQNQNWPAIVYRLKLSGTFVYLPENNMLGFIHPSERYSE 218 

++A PA ++ + T A ++ N+ VYRL SG+FV + ++ + FIHPSER E 

Sbjct: 144 MFAKPAPEDIISELFTDASEDLMNKELTGTVYRLIASGSFV- ITDDGIRCFIHPSERKEE 202 
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-81- 
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Sbjct: 203 PRLGSRVTGRVIQVKE-DGSVNLSLLPRKQDAMSVDAECILTYI>1RMRNGAMPYSDKSQPD 261 

Query: 279 EIKATFGISKGQFKKALGGLMKAKKIKQD 307 

+1+ F +SK FK+ALG LMK K+ Q+ 
Sbjct: 262 DIRERFNMSKAAFKRALGHLMKNGKVYQE 290 

A related DNA sequence was identified in S.pyogenes <SEQ ID 89> which encodes the amino acid 
sequence <SEQ ID 90>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0811 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 235/284 (82%) , Positives = 265/284 (92%) 



Query: 31 

MN LLATVITGL+ +EN + YFI K+GFTF LSK+EGE IG+MV GFAYTD++QKARLT 
Sbjct: 1 ^LLATVITGLIKEENANDYFIHKEGFTFTLSKAEGERQIGDMVTGFAYTDIEQKARLT 60 

Query: 91 TKETFATRDHYGWGTVTEVRKDLGVFIjDTGLPDKQVVVSLDVLPELKELWPKKGDRLYVC 150 

TKE +TR YGWG VTEVR+DLGVF+DTG+P+K++WSLDVLPE+KELWPKKGD+LY+ 
Sbjct: 61 TKEIRSTRTSYGWGEVTEVRRDLGVFVDTGIPNKEIWSLDVLPEMKELWPKKGDKLYIR 120 

Query: 151 LDVDKKDRLWALPADPEVFQRmTPAYNNMQNQNWPAIVYRLKLSGTFVYLPENNMLGFI 210 

LDVDKKDR+W LPA+PEVFQ+MA+PAYNNMQNQ+WPATVYRLKL+GTFVYLPENNMLGFI 
Sbjct: 121 LDVDKKDRIWGLPAEPEVFQKMASPAYNNMQNQHWPAIVYRLKLTGTFVYLPENNMLGFI 180 

Query: 211 HPSERYSEPRLGQVLDARVIGFREVDRTLNLSLKPRSFEMLENDAQMILTYLESNGGFMT 270 

H SERY+EPRLGQVLDARVIGFREVDRTLNLSLKPRSFEMLENDAQMI+TYLE+NGGFMT 
Sbjct: 181 HSSERYAEPRLGQVLDARVIGFREVDRTLNLSLKPRSFEMLENDAQM1VTYLEANGGFMT 240 

Query: 271 LNDKSSPEEIKATFGISKGQFKKALGGLMKAKKIKQDQLGTELL 314 

LNDKSSPEEIKA+FGISKGQFKKALGGLMKAK+IKQD GTEL+ 
Sbjct: 241 LNDKSSPEEIKASFGISKGQFKKALGGLMKAKRIKQDATGTELI 284 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 30 

A DNA sequence (GBSx0028) was identified in S.agalactiae <SEQ ID 91> which encodes the amino acid 
sequence <SEQ ID 92>. This protein is predicted to be peptide methionine sulfoxide reductase (msrA). 
Analysis of this protein sequence reveals the following: 

Possible site: 33 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 0866 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10021> which encodes amino acid sequence <SEQ ID 
10022> was also identified. 
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The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05167 GB:AP001512 peptide methionine sulfoxide reductase 
[Bacillus halodurans] 
Identities = 102/173 (58%) , Positives = 126/173 (71%) , Gaps = 2/173 (1%) 





14 


ENDMEFAIFAGGCFWOWQPFEELDGIESVLSGYTGGHVENPTYKEVCSKTTGHTEAVEI 


73 






E+ A FAGGCFWCMV PFEE GI V+SGYTGGH ENPTYKEVCS+TTGH EAV+I 




Sbjct: 


3 


ESKWALATFAGGCFWCMVSPFEEEPGIHQWS 


GYTGGHTENPTYKEVCSETTGHYEAVQI 


62 


Query: 


74 


IFNPEKISYADLVELYWAQTDPTDAFGQFEDR 


GDNYRPV1FYENEEQRQIAQKSKDKLQA 


133 






F+PE Y L+E+YW Q DPTD GQF DR 


GD+YR I FY +E+Q+Q A SK KL+ 




Sbjct: 


63 


SFDPEVFPYEKLLEIYWTQIDPTDPGGQFHDR 


GDSYRTAI FYHDEQQKQAADASKQKLEE 


122 






SGRFDRPIVTSIEPADTFYPAEDYHQAFYRTN 


PARYAL— SSARRHAFLEENW 184 








SG+F+ PIVT I PA FYPAE+YHQ +++ N 


P Y + + R AF++++W 




Sbjct: 


123 


SGKFNAPIVTRILPAKPFYPAEEYHQKYHKKN 


PFHYKMYRHGSGREAFIKQHW 175 





A related DNA sequence was identified in S.pyogenes <SEQ ID 93> which encodes the amino acid 
sequence <SEQ ID 94>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0 . 00B4 (Affirmative) < suo 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

RGD motif: 89-91 

The protein has homology with the following sequences in the databases: 

>GP:BAB05167 GB:AP001512 peptide methionine sulfoxide reductase 
[Bacillus halodurans] 
Identities = 98/168 (58%) , Positives = 125/168 (74%) , Gaps = 4/168 (2%) 



Sbjct 

Sbj. 

Sbji 



4 AI FAGGCFWCMVQPFEEQAGI LSVRSGYTGGHLPNPSYEQVCAKTTGHTEAVEI I FDPKQ 63 

A FAGGCFWCMV PFEE+ GI V SGYTGGH NP+Y++VC+ + TTGH EAV+I FDP+ 
9 ATFAGGCFWCMVSPFEEEPGIHQWSGYTC-GHTENPTYKEVCSETTGHYEAVQISFDPEV 68 

64 IAYKDLVELYWTQTDPTDAFGQFEDRGDNYRPVIYYTTERQKEIAEQSKANLQASGRFDQ 123 

Y+ L+E+YWTQ DPTD GQF DRGD+YR I+Y E+QK+ A+ SK L+ SG+F+ 
69 FPYEKLLEIYWTQIDPTDPGGQFHDRGDSYRTAIFYHDEQQKQAADASKQKLEESGKFNA 128 

124 PIVTTIEPAEPFYLAEDYHQGFYKKNP KRYAQSSAIRHQFLEENW 168 

PIVT I PA+PFY AE+YHQ ++KKNP K Y S R F++++W 
129 P IVTRI LPAKPFYPAEEYHQKYHKKNPFH YKMYRHGSG - REAF I KQHW 175 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 130/168 (77%) , Positives = 148/168 (87%) 

Query: 17 MERAIFAGGCFWCMVQPFEELDGIESVLSGYTGGHVENPTYKEVCSKTTGHTEAVEIIFN 76 

MERAI FAGGCFWCMVQPFEE GI SV SGYTGGH+ NP+Y++VC+KTTGHTEAVEIIF+ 
Sbjct: 1 MERAIFAGGCFWCMVQPFEEQAGILSVRSC-YTGGHLPNPSYEQVCAKTTGHTEAVEIIFD 60 

Query: 77 PEKISYADLVELYWAQTDPTDAFGQFEDRGDNYRPVIFYENEEQRQIAQKSKDKLQASGR 136 

P++I+Y DLVELYW QTDPTDAFGQFEDRGDKYRPVI+Y E Q++IA++SK LOASGR 
Sbjct: 61 PKQIAYKDLVELYWTQTDPTDAFGQFEDRC-DNYRPVIYYTTERQKEIAEQSKANLQASGR 120 

Query: 137 FDRPIVTSIEPADTFYPAEDYHQAFYRTNPARYALSSARRHAFLEENW 184 

FD+PIVT+IEPA+ FY AEDYHQ FY+ NP RYA SSA RH FLEENW 
Sbjct: 121 FDQPIVTTIEPAEPFYLAEDYHQGFYKKNPICRYAQSSAIRHQFLEENW 168 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 31 

A DNA sequence (GBSx0029) was identified in S.agalactiae <SEQ ID 95> which encodes the amino acid 
5 sequence <SEQ ID 96>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 2727 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database: 



Query: 3 KSFYSWLMTQRNPKSNEPVAILADYAFDETTFPKHSSDFETVSRYLEDBASFSFNLTDFD 62 
KSFY +L+ R+PK 4- ++ A+ A+ + + -i-FPK S+D+ +S YLE A + + FD 

Query: 63 DIWEDY 68 

+ W+ Y 
Sbjct: 62 EAWDQY 67 

A related DNA sequence was identified in S.pyogenes <SEQ ID 97> which encodes the amino acid 
sequence <SEQ ID 98>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2571 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities » 59/71 (83%) , Positives = 65/71 (91%) 

Query: 1 ^KSFYSVttMTQRNPKSNEPVAILADYAFDETTFPKHSSDFETVSRYLEDEASFSFNLTD 60 

MRKSFYSWLMTQRNPKSNEPVAILAD FD+TTFPKH++DFE +SRYLED+AS FS FNL 
Sbjct: 3 MRKS FYSWLMTQRNPKSNEPVAILADLVFDDTTF PKHTNDFEL I SRYLEDQAS FS FNLGQ 62 

Query: 61 FDD IWEDYLNH 71 

FD+1WEDYL H 
Sbjct: 63 FDEIWEDYLAH 73 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 32 

A DNA sequence (GBSx0030) was identified in S.agalactiae <SEQ ID 99> which encodes the amino acid 
sequence <SEQ ID 100>. This protein is predicted to be antigen, 67 kDa (myosin-crossreactive). Analysis 
of this protein sequence reveals the following: 
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Possible site: 



44 ( 26 - 45) 

Final Results 

bacterial membrane Certainty=0. 2826 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 10 1> which encodes the amino acid 
sequence <SEQ ID 102>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -4.62 Transmembrane 40 - 56 ( 3B - 57) 

Final Results 

bacterial membrane Certainty=0. 2848 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9109> which encodes the amino acid sequence 
<SEQ ID 91 10>. Analysis of this protein sequence reveals the following: 

3 

N-terminal signal sequence 

Final Results 

bacterial membrane — Certainty= 0 . 285 (Affirmative) < suco 

bacterial outside Certainty= 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty- 0.000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 477/590 (80%) , Positives = 542/590 (91%) 

Query: 3 MRYTNGNFEAFARPRKPEGVDKKSAYIVGSGLAGLAAAVFLIRDGQMDGQRIHIFEELPL 62 

M YT+GN+EAFA PRKPEGVD+KSAYIVG+GIAGLAAAVFLIRDG M G+RIH+FEELPL 
Sbjct: 15 MYYTSGNYEAFATPRKPEGVDQKSAYIVGTGLAGLAAAVFLIRDGHMAGERIHLFEELPL 74 



Query: 123 SSNCRLIHKQGNRLESDGDFTLGTHSKELVKLVNETEESLGAKTIEEVFSKEFFESNFWT 182 

SSNCRLIHK+GNR++ DG +TLG SKEL+ L+M+TEESLG +TIEE FS++FF+SNFW 
Sbjct: 135 SSNCRLIHKRGNRVDDDGQYTLGKQSKELIHLIMKTEESLGDQTIEEFFSEDFFKSNFWV 194 



Sbjct: 195 YWATMFAFEKWHSAVEMRRYAMRFIHHIDGLPDFTSLKFNKYNQYDSMVKPIIAYLESHD 254 

Query: 243 VDVQFDSKVTNISVDFKNGQKLAKAIHLWGGEAKTIDLTPNDFVFVTNGSITESTNYGS 302 

VD+QFD+KVT+I V+ G+K+AK IH+TV GEAK I+LTP+D VFVTNGS I TES + YGS 
Sbjct: 255 VDIQFDTKVTDIQVEQTAGKKVAKTIHMTVSGEAKAIELTPDDLVFVTNGSITESSTYGS 314 



Sbjct: 315 HHEVAKPTKALGGSWNLWENLARQSDDFGHPKVFYQDLPAESWFVSATATIKHPAIEPYI 374 

Query: 363 ERLTHPJDLHDGKVNTGGIVTVTDSNWMMSFAIHRQPHFKEQKENETIVWIYGLYSNVEGN 422 

ERLTHRDLHDGKVNTGGI+T+TDSN/JMM3FAIHRQPHFKEQKENET VWIYGLYSN EGN 
Sbjct: 375 ERLTHRDLHDGKVNTGGIITITDSN*IM3FAIHRQPHFKEQKENETTVWIYGLYSNSEGN 434 
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Query: 423 YIKKPIEECTGREITEEKLYHLGVPEMKIHDLSDKQW3TVPOTMPYITSYFMPRVKGDR 482 

Y+ K IEECTG+E I TEEWLYHLGVP KI DL+ + Y++TVPVYMPYITSYFMPRVKGDR 
Sbjct: 435 YVHKKIEECTGQEITEEWLYHLGVPVDKIKDLASQDYINTVPVYMPYITSYFMPRVKGDR 494 

Query: 483 PDVIPQGSVNLAFIGNFAESPSRDTVFTTEYSIRTAMEAVYTFLNIERGVPEVFNSAFDI 542 

P VIP GSVNIAFIGNFAESPSRDTVFTTEYSIRTAMFAVY+FLM+ERG+PEVFNSA+DI 
Sbjct: 495 PKVIPDGSVNLAFIGNFAESPSRDTVFTTEYSIRTAMEAVYSFLNVERGIPEVFNSAYDI 554 

Query: 543 RVLLQSLYYLNDKKSVEDMDLPIPALMRKVGMKKIRGTYLEELLREAHLL 592 

R LL++ YYLNDKK+++DMDLPIPAL+ K+G KKI+ T++EELL++A+L+ 
Sbjct: 555 RELLKAFYYLNDKKAIKDMDLPIPALIEKIGHKKIKDTFIEELLKDANLM 604 

A related GBS gene <SEQ ID 8475> and protein <SEQ ID 8476> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: -19.82 
GvH: Signal Score (-7.5): -1.16 

Possible site: 14 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -4.57 threshold: 0.0 

INTEGRAL Likelihood = -4.57 Transmembrane 26- 42 ( 26- 45) 
PERIPHERAL Likelihood = 6.79 378 
modified ALOM score: 1.41 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 2826 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) 

SEQ ID 8476 (GBS90) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 18 (lane 6; MW 68.5kDa). 

The GBS90-His fusion product was purified (Figure 194, lane 11) and used to immunise mice. The 
resulting antiserum was used for Western blot (Figure 256A), FACS (Figure 256B), and in the in vivo 
passive protection assay (Table III). These tests confirm that the protein is immunoaccessible on GBS 
bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 33 

A DNA sequence (GBSx0031) was identified in S.agalactiae <SEQ ID 103> which encodes the amino acid 
sequence <SEQ ID 104>. This protein is predicted to be phoh-like protein (phoH). Analysis of this protein 
sequence reveals the following: 

Possible site: 38 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2339 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14476 GB:Z99117 phosphate starvation- induced protein 
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Query: 27 

L++PD+ +SLFG+ + LKL+E++L++ I R E + V GD +E+ + A + 4LL L+ 
Sbjct: 12 LKNPDE^SLFGNQDSFLKIWEKDLNLNIITRGETIYVSGD-DESFQIADRLLGSLLALI 70 

Query: 87 1TOGMTONTSDVVTALSMAQNGSIDKFVALYEEEIIKDSYGKPIRVKTLGQKIYVDSVKNH 146 

+G+ ++ DV+ A+ MA+ ++ F ++YEEEI K++ GK IRVKT+GQ+ YV ++K + 
Sbjct: 71 RKGIEI SERDVI YAIKMAKKNELEYFESMYEEE I TKNAKGKSIRVKTMGQREYVAAMKRN 130 

Query: 147 DWFGIGPAGTGKTFIAVTIAVTM.KRGQVICRIILTRPAVEAGESLGFLPGDLKEKVDPY 206 

D+VFGIGPAGTGKT+IAV AV ALK G +K+IILTRPAVEAGESLGFLPGDLKEKVDPY 
Sbjct: 131 DLVFGIGPAGTGKTYIAWKAVHALKNGHI KKI I LTRPAVEAGESLGFLPGDLKEKVDPY 190 

Query: 207 LRPVYDALYQILGKEQTSRLMEREIIEIAPLAYMRGRTLDDAFVILDEAQNTTIMQMKMF 266 

LRP+YDAL+ +LG + T RLMER IIEIAPLAYMRGRTLDDA+VILDEAQNTT QMKMF 
Sbjct: 191 LRPLYDALHDVLGADHTERLMERGIIEIAPLAYMRGRTLDDAYVILDEAQNTTPAQMKMF 250 

Query: 267 LTRLGFNSKMIVNGDVSQIDLPKNVKSGLIDAVEKLRNIKKIDFIHLSAKDWRHPWAE 326 

LTRLGF+SKMI+ GDVSQIDBPK VKSGL A E L+ I I I L DWRHP+VA+ 
Sbjct: 251 LTRLGFSSKMI1TGDVSQIDLPKGVKSGLAVAKEMLKG1DGISMIELDQTDWRHPLVAK 310 

Query: 327 IINAY 331 
II AY 

Sbjct: 311 IIEAY 315 

A related DNA sequence was identified in S.pyogenes <SEQ ID 105> which encodes the amino acid 
sequence <SEQ ID 1 06>. Analysis of this protein sequence reveals the following: 
Possible site: 42 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.85 Transmembrane 54 - 70 ( 54 - 70) 

Final Results 

bacterial membrane Certainty=0. 1341 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 274/322 (85%) , Positives = 298/322 (92%) 



TI+ALLVLV RGM VNTSDWTALSMA++ ID+F+ALYEEEIIKD+YGK IRVKTLGQK 



YVDSVK HDWFG+GPAGTGKTFIAVTEAVTALKRGQVKRI ILTRPAVFAGESLGFLPG 







18 


45 




1 






78 


50 


Sbjct: 


61 




Query: 


138 




Sbjct: 


121 


55 


Query: 


198 




Sbjct: 


181 


60 








Sbjct: 


241 






318 


65 


Sbjct: 


301 



ILGKEQT+RLMER++IEIAPLAYMRGRTLDDAFVILDEAQN 



WRHPWA+II AY S 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 34 

A DNA sequence (GBSx0032) was identified in S.agalactiae <SEQ ID 107> which encodes the amino acid 
sequence <SEQ ID 108>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0275 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 35 

A DNA sequence (GBSx0033) was identified in S.agalactiae <SEQ ID 109> which encodes the amino acid 
sequence <SEQ ID 1 10>. This protein is predicted to be MutT/nudix family protein. Analysis of this protein 
sequence reveals the following: 
Possible site: 46 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0.23B3 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF09597 GB:AE001864 MutT/nudix family protein [Deinococcus radiodurans] 
Identities = 49/136 (36%), Positives = 69/136 (50%), Gaps = 8/136 (5%) 

Query: 5 YISyiRSKVGHETIFLTYSGGILTDGKGRVIiLQLRADKNSWGIIGGCMELGESSVDTLKR 64 

Y+S +R+ GH + +L D GRVLLQ R D WG1+GG +E GE + R 

Sbjct: 6 YLSELRAVWGHRALPAAGVSVLLQDETGRVIjLQRRGDDGQWGILGGGLEPGEDFLIARHR 65 

Query: 65 EFFEETGLRVEPIRIjIiNVY TNFQDSYPNGDKAQTVGFIYEVSCPKPVNIEGFHN 118 

E EETGLR +R L + F YENGD+ VG E + P + + 

Sbjct: 66 ELriEETGLRCPNLRPLPLSEGLVSGPQFWHRYPNGDEVYLVGLRTEGTVPAfiALTDACPD 125 

Query: 119 E - -ETLQLDYFSKEDV 132 

+ ETL+L +F+ +D+ 
Sbjct: 126 DGGETLELRWFALDDL 141 

A related DNA sequence was identified in S.pyogenes <SEQ ID 111> which encodes the amino acid 
sequence <SEQ ID 1 12>. Analysis of this protein sequence reveals the following: 

Possible site: 61 
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»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4375 (Affirmative) < suco 

5 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 93/157 (59%) , Positives = 123/157 (78%) 

10 

Query: 1 MKQDYISYIRSKVGHETIFLTYSGGILTDGKGRVLLQLRADKNSWGIIGGCMELGESEVD 60 

M QDYISYIRSKVGH+ I L ++GGILT+ G+VL+QLR DK +W I GG MELGESS++ 
Sbjct: 16 MPQDYISYIRSKVGHDKIILNFAGGILTNDDGKVLMQLRGDKKTWTIPGGTMELGESSLE 75 

15 Query: 61 TLKREFFEETGLRA7EPIRLLNVYTNFQDSYPNGDKAQTVGFIYEVSCPKPVNIEGFHNEE 120 

T KREF EETG+ VE +RLLNVYT+F++ YPNGD QT+ FIYE++ + 1+ FHNEE 
Sbjct: 76 TCKREFLEETGIEVEAVRLLNVYTHFEEVYPNGDAVQTIVFIYELTAVSDMAIDNFHNEE 135 

Query: 121 TLQLDYFSKEDVKNITIVNEQHQLILDEYFSQTFQMG 157 
20 TL+L +FS E4-+ + V+ +H+L+L+EYFS +F MG 

Sbjct: 136 TLKLQFFSHEEIAELESVSAKHRLMLEEYFSDSFAMG 172 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 Example 36 

A DNA sequence (GBSx0034) was identified in S.agalactiae <SEQ ID 113> which encodes the amino acid 

sequence <SEQ ID 1 14>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

30 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3690 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Cert aint y= 0. 0 000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
40 vaccines or diagnostics. 

Example 37 

A DNA sequence (GBSx0035) was identified in S.agalactiae <SEQ ID 1 15> which encodes the amino acid 
sequence <SEQ ID 1 16>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

45 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 
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>GP:AAG05249 GB:AE004612 hypothetical protein [Pseudomonas aeruginosa] 
Identities = 70/254 (27%), Positives = 127/254 (49%), Gaps = 2/254 (0%) 

KITLHGVAETLLITLYIFAKDAMAKHPILNDQKSLAI VEQIEYDFDKFDNSEASFYATLA 6 1 
+ITL G +TLLITLY +A D+ IL+D-t- + V QI++DF + 4 + A 

RITLTGEKQTLLITLYAKALDSRLDDSILHDRFAEEAVRQIDFDFSRVALGKGNERALAM 64 



++ F + +p Q+L++GCGLD+R RVD ++ W++LD PEVM++R+ 





2 


Sbjct: 


5 


Query: 


62 


Sbjct: 


65 


Query: 


121 


Sbjct: 


125 


Query: 


181 


Sbjct: 


185 




241 


Sbjct: 


245 



P+ P L+++EG++ +L+E V 



No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8477> and protein <SEQ ID 8478> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: 0.37 
GvH: Signal Score (-7.5): -0.97 

Possible site: 25 
>» Seems to have a cleavable N-terra signal seq. 
ALOM program count: 0 value: 4.35 threshold: 0.0 
PERIPHERAL Likelihood = 4.35 143 
modified ALOM score: -1.37 

*** Reasoning Step: 3 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

27.6/51.6% over 253aa 

Pseudomonas aeruginosa 

GP | 9947849 | hypothetical protein Insert characterized 
ORF02096(304 - 1059 of 1404) 

GP|9947849|gb|AAG05249.l|AE004612_3|AE004612(5 - 258 of 275) hypothetical protein 
{Pseudomonas aeruginosa} 
%Match =11.6 

%Identity =27.6 %Similarity =51.6 

Matches = 70 Mismatches = 121 Conservative Sub.s = 61 

255 285 315 345 375 405 435 465 

E*YT*PJ^PVLEIQISK*NSIKESR*MKITLHGVAETLLITLYIRAKDAMAKHPILiroQKSmivEQIEYDFDKFDNSEAS 
= 111 I :||lllll =1 I: 1 I : I = = I ll-ll = = = 

MPGHRITLTGEKQTLLITLYAKALDSRLDDSILHDRFAEEAVRQIDFDFSRVALGKGN 



FYATLARXRVMDREIKKFIRENPNSQILSIGCGLDTRFERTON-GQIRVJYNLDLPEVMEIRKLFFEEHERVTNIAKSALD 
I I I: -I: =1 M-IIIIM III == |: = || llll-N - = = = l 
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ETWTRETOPQNAPFLIVSEGVLMFLiCEDDVETFLHILTNSFSQFMAQFDLCHKEMINKGKQHDTVKYMDTEFQFGITDGH 
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DDGWLQGVPRERPMjVIAEGLMPYLRESQVRRLVERLVDHLGSGELLFDGYGRLGIMLIJILYPPLRETGAQVHWSIDDPR 
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EIVDLDPKLKQINLINFTDEMSKFELG-TLRSLLPTIRKFNNCLGVYEYKASEKK*QKSIYIKRHSKCKFVIIVIAFVAL 
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SEQ ID 8478 (GBS176) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 36 (lane 5 & 6; MW 30kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 41 (lane 7; MW 55.4kDa). 

The GBS176-GST fusion product was purified (Figure 117A; see also Figure 202, lane 5) and used to 
immunise mice (lane 1+2 product; 13.5ug/mouse). The resulting antiserum was used for Western blot 
(Figure 117B), FACS (Figure 117C), and in the in vivo passive protection assay (Table III). These tests 
confirm that the protein is immunoaccessible on GBS bacteria and that it is an effective protective 
immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 38 

A DNA sequence (GBSx0036) was identified in S.agalactiae <SEQ ID 1 17> which encodes the amino acid 
sequence <SEQ ID 1 18>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3712 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0, 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10019> which encodes amino acid sequence <SEQ ID 
10020> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC3804S GB:AF000954 No definition line found [Streptococcus mutans] 
Identities = 140/164 (85%) , Positives = 157/164 (95%) 

Query: 1 MYVEMIDETGQVSEDIKKQTLDLLEFAAQKTGKENKEMAVTFVTNERSHELNLEYRDTDR 60 

MY+EMIDET QVSE IK QTLD+LEFAAQKTGKE+KE^VTFVTNERSHELNL+YRDT+R 
Sbjct: 1 MYIEMIDETNQVSEGIKNQTLDILEFAAQKTGKEDKE^VTBVllffiRSHEIjNLKYRDTNR 60 



Query: 121 E 

EREMGFLAVHGFLHINGYDHYTP+EEK3MFSLQEEIL AYC-LKR 
Sbjct: 121 EREMGFLAVHGFLHINGYDHYTPQEEKEMFSLQEEILDAYGLKR 1 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 119> which encodes the amino acid 
sequence <SEQ ID 120>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1145 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 138/165 (83%), Positives = 153/165 (92%) 

Query: 1 MYVEMIDETGQVSEDIKKQTLDLLEFAAQKTGKENKEMAVTFVTMERSHELNLEYRDTDR 60 

MY+EMIDETGQVS++I +QTLDLL FAAQKTGKE KEMWTFVTNERSHELNLEYRDTDR 
Sbjct: 18 ^IEMIDETGQVSQEIMEQTLDLLNFAAQKTGKEEKEMSVTFVTNERSHELNLEyRDTDR 77 

Query: 61 PTDVI SLEYKPEVDI S FDEEDLAENPELAEMLiEDFDS Y IGELFI SI DKAKEQAEEYGHSY 120 

PTDVISLEYKPE I F +EDLA +P LAEM+ +FD+YIGELFISIDKA+EQ++EYGHS+ 
Sbjct: 78 PTDVISLEYKPETPILFSQEDLftADPSLAEMMAEFDAYIGELFISIDKftREQSQEYGHSF 137 

Query: 121 EREMGFIAVHGFLHINGYDHYTPEEEKEMFSLQEEIIjTAYGIjKRQ 165 

EREMGFIAVHGFLHINGYDHYT EEEKEMF+LQEEILTAYGL RQ 
Sbjct: 138 EREMGFLAVHGFLHINGYDHYTLEEEKEMFTIjQEEILTAYGLTRQ 182 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 39 

A DNA sequence (GBSx0038) was identified in S.agalactiae <SEQ ID 121> which encodes the amino acid 
sequence <SEQ ID 122>. This protein is predicted to be phosphoglycerate dehydrogenase (serA) (serA). 
Analysis of this protein sequence reveals the following: 

Possible site: 59 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2817 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB99020 GB:U67544 phosphoglycerate dehydrogenase (serA) 
[Methanococcus j annaschii] 
45 Identities = 82/232 (35%) , Positives = 132/232 (56%) , Gaps = 14/232 (6%) 

Query: 3 ENPDAYIIRSQNLHNQDF PSNLKAIARAGAGTNNIPIEEASAQGIWFNTPGANANA 59 

++ D ++RS +D LK I RAG G +NI +E A+ +G1+V N P A++ + 

Sbjct: 40 KDADVLVVRSGTKVTRDVIEKAEKLKVIGRAGVGVDNIDVEAATEKGIIVVNAPDASSIS 99 

50 

Query: 60 VKEAVIAALLLSARDYLGANRWVNTLTGTDIPKQIEAGKKAFAGNEIAGKKLGVIGLGAI 119 

V E + +L +AR N T K+ E +K F G E+ GK LGVIGLG I 

Sbjct: 100 VAELTMGLMLAAAR NIPQATASLKRGEWDRKRFKGIELYGKTLGVIGLGRI 150 

55 Query: 120 GARIANDARRLGMTVLGYDPYVSIETAWNISSHVQRVKEIKDIFETCDYITIHVPLTNET 179 

G ++ A+ GM ++GYDPY+ E A ++ V+ V +1 ++ + D+IT+HVPLT +T 
Sbjct: 151 GQQWKRAKAFGMNI IGYDPYIPKEVAESMG- -VELVDDINELCKRADFITLHVPLTPKT 208 
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15 



20 



Query: 180 KHTFDAKAFSIMKKGTTIINFARAELVHNQELFEAIETGWKRYITDFGDKE 231 

+H + ++MKK I+N AR L++ + L+EA++ G ++ D ++E 
Sbjct: 209 RHI IGREQIALMKKNAI I WCARGGLIDEKALYFALKEGKIRAAALDVFEEE 260 

A related DNA sequence was identified in S.pyogenes <SEQ ID 123> which encodes the amino acid 
sequence <SEQ ID 124>. Analysis of this protein sequence reveals the following: 

PO£3sible site: 52 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0.23S4 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 52/198 (26%) , Positives = 93/198 (46%) , Gaps = 14/198 (7%) 

Query: 24 LKAI APAGAGTIOTI PI EEASAQGI WFNTPGANANAVKEAVIAALLLSARDYLGANRWVN 83 

+K IA+ A + + + A+ I++ N P + ++ E + +L R 
Sbjct: 70 IKQIAQHSASVDMYNLDLATENDI I ITNVPSYSPESIAEFTVTIVLNLIRHV 121 

Query: 84 TLTGTDIPKQIEAGKKAFAGNEIAGKKLGVIGLGAIGARIANDARRLGMTVLGYDPYVSI 143 
h ++ KQ G + + +IG G IG A + G V+GYD Y S 

25 Sbjct: 122 ELIRENVKKQNFTWGLPIRGRVLGDMTVA1IGTGRIGLATAKIFKGFGCKVVGYDIYQS- 180 

Query: 144 ETAWNI SSHVQRVKE - IKDIFETCDYITIHVPLTNETKHTFDAKAFS IMKKGTTI INFAR 202 

+ A + + + V+E IKD D +++H+P T E H F++ F KKG ++N AR 

Sbjct: 181 DAAKAVLDYKESVEEAIKD ADLVSLHMPPTAENTHLFNSDLFKSFKKGAILMNMAR 236 

30 

Query: 203 AELVNNQELFEAIETGW 220 

++ Q+L +A++ G++ 
Sbjct: 237 GAVIETQDLLDALDAGLL 254 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 40 

A DNA sequence (GBSx0039) was identified in S.agalactiae <SEQ ID 125> which encodes the amino acid 
sequence <SEQ ID 126>. This protein is predicted to be alpha-glycerophosphate oxidase. Analysis of this 
40 protein sequence reveals the following: 

Possible site: 50 

»> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 2067 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MLFMRDNLDSLIQPVIDEMAKHYQWSDQDKTFYEEELHETLKDNDLAAL 49 

MLFMRD+LDS+++PV+DEM + Y W++++K Y ++ L +NDLA L 
Sbjct: 558 MLFMRDSLDSIvEPVLDEMGRFYDWTEEEKATYR.^VEAALANNDLAEL 606 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 127> which encodes the amino acid 
sequence <SEQ ID 128>. Analysis of this protein sequence reveals the following: 



Possible site: 40 
> Seems to have no N-terminal signal sequence 

Likelihood = -1.81 Transmembrane 



Final Results 

bacterial membrane Certainty=0. 1723 (Affirmative) ■ 

bacterial outside Certainty=0. 0000 (Not Clear) < s 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < s 

The protein has homology with the following sequences in the databases: 



Query: 1 MEFSRETRRLALQKMQERDLDLLIIGGGITGAGVALQAAASGLDTGLIEMQDFAQGTSSR 60 

MEFS++TR L+4+KMQER LDLLIIGGGITGAGVALQAAASGL+TGLIEMQDFA+GTSSR 
Sbjct: 1 MEFSKKTRELSIKKMQERTLDLLIIGGGITGAGVALQAAASGLETGLIEMQDFAEGTSSR 60 

Query: 61 STKLVHGGLRYLKQFDTOWSDWSERAWQQIAPHIPKPDPMLLPVYDEPGSTFSMFRL 120 

STKLVHGGLRYLKQFDVEWSDTVSERAWQQIAPHI PKPDPMLLPVYDE G+TFS+FRL 
Sbjct: 61 STKLVHGGLRYLKQFDVEWSDTVSERAWQQIAPHIPKPDPMLLPVYDEDGATFSLFRL 120 

Query: 121 KVAMDLYDLIAGVS^^^PAANKVLTKEEVLKREPDLKQEGLLGGGWLDFRNNDARLVIEN 180 

KVAMDLYDLLAGVSNTP ANKVL+K++VL+R+P+LK+EGL+GGGVYLDFRNNDARLVIEN 
Sbjct: 121 KVAMDIaYDLIAGVSOTPTANKVLSKDQVLERQPNLKKEGLVGGGVYLDFRNNDARLVIEN 180 

Query: 181 IKRANRDGALIASHVKAEDFIiLDDNGKIIGVKARDLLSDQEIIIKAKLVINTTGPWSDEI 240 

IKRAN+DGALIA+HVKAE FL D++GKI GV ARDLL+DQ IKA+LVINTTGPWSD++ 
Sbjct: 181 IKRANQDG^IANHVKAEGFLFDESGKITGVVARDLLTDQVFEIKARLVINTTGPWSDKV 240 

Query: 241 RQFSHKGQPIHQMRPTKGVTILVVDRQKLPVSQPVYVDTGLNDGRMVFVLPREEKTYFGTT 300 

R S+KG QMRPTKGVHLWD K+ VSQPVY DTGL DGRMVFVLPRE KTYFGTT 
Sbjct: 241 RNLSNKGTQFSQMRPTKGVHLWDSSKIKVSQPVYFDTGLGDGRMVFVLPRENKTYFGTT 300 

Query: 301 DTDYTGDLEHPQVTQEDVDYLLGVVNNRFPNANVTIDDIESSWAGIiRPLLSGNSASDYNG 360 

DTDYTGDLEHP+VTQEDVDYLLG+VNNRFP +N+TIDDIESSWAGLRPL++GNSASDYNG 
Sbjct: 301 DTDYTGDLEHPKVTQEDVDYLLGIVNNRFPESNITIDDIESSWAGLRPLIAGNSASDYNG 360 

Query: 361 GNSGKVSDDSFDHLVDTVKAYINHEDSREAVEKAIKQVETSTSEKELDPSAVSRGSSFER 420 

GN+G +SD+SFD+L+ TV++Y++ E +RE VE A+ ++E+STSEK LDPSAVSRGSS +R 
Sbjct: 361 GNNGTISDESFDNLIATVESYLSKEKTREDVESAVSKliESSTSEKHLDPSAVSRGSSLDR 420 

Query: 421 DENGLFTLAGGKITDYRKMAEGALTGIIQILKEEFGKSFKLINSKTYPVSGGEINPANVD 480 

D+NGL TLAGGKITDYRKMAEGA+ ++ ILK EF +SFKLINSKTYPVSGGE+NPANVD 
Sbjct: 421 DDNGLLTLAGGKI TDYRKMAEGAMERWD I LKAE FDRS FKLINSKTYPVSGGELNPANVD 480 

Query: 481 SEIEAYAQLGTLSGLSMDDARYLANLYGSNAPKVFALTRQLTAAEGLSLAETLSLHYAMD 540 

SEIEA+AQLG GL +A YLANLYGSNAPKVFAL L A GLSLA+TLSLHYAM 
Sbjct: 481 SEIFAFAQLGVSRGLDSKEAHYIiAl^LYGSNAPKVFALAHSLEQAPGLSLADTLSLHYAMR 540 

Query: 541 YEMALKPTDYFLRRTNHLLFMRDSLDALIDPVINEMAKHFEWSDQERVAQEDDLRRVIAD 600 

E+AL P D+ LRRTNH+LFMRDSLD++++PV++EN + ++W+++E+ D+ +A+ 

Sbjct: 541 NEIiALSPVDFLLRRTNHMLFMRDSLDSIV^PVLDEMGRFYDWTEEEKATYPADVEAALAN 600 

Query: 601 NDLSALK 607 

NDL+ LK 
Sbjct: 601 NDLAELK 607 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 29/49 (59%) , Positives = 41/49 (83%) 
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Sbjct: 558 LLFMRDSLDALIDPVINEMAKHFEWSDQERVAQEDDLRRVIADNDLSAL 606 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 41 

A DNA sequence (GBSx0040) was identified in S.agalactiae <SEQ ID 129> which encodes the amino acid 
sequence <SEQ ID 130>. Analysis of this protein sequence reveals the following: 
Possible site: 40 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1011 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB06309 GB:AP001516 unknown conserved protein [Bacillus halodurans] 
Identities = 70/160 (43%), Positives = 106/160 (65%), Gaps = 3/160 (1%) 

Query: 5 TRPTTDKVKGAIFNMIGPFFEGGRVLDLFSGSGSLAIEAISRGMDQAVLVEKDRRAQWI 64 

TRPTTDKVK AIFNMIGPFF4GG LDL+ GSG L IEA+SRG+++ + V++ +RA I 
Sbjct: 21 TRPTTDKVKEAIFNMIGPFFDGGIGLDLYGGSGGLGIEALSRGVERMIFVDQQKRAIETI 80 

Query: 65 QENIAMTKSPEQFQLLKMEANRALEQLTGQ FDLVLLDPPYAKEEIVKQIQIMDSKGL 121 

++ N ++ + ++ + +A RAL+ LT + F V LDPPYAK+ I + 1+ + GL 

Sbjct: 81 KQOTjSHCGLEGRAEVYRNDAKRALQVLTIOXGIVFAYVPLDPPYAKQTIKKDLAIIiANHGL 140 

Query: 122 LGDDIMIACETDKSVDLPEEIASFGIWKQKIYGISKVTVY 161 

L + ++ CE D+ LP++I K+4 YG + +T+Y 

Sbjct: 141 LEEGGWVCEHDRDTMLPDQIEYAVKHKEETYGDTMITIY 180 

A related DNA sequence was identified in S.pyogenes <SEQ ID 131> which encodes the amino acid 
sequence <SEQ ID 132>. Analysis of this protein sequence reveals the following: 
Possible site: 58 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3814 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) c suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 111/160 (69%) , Positives = 136/160 (84%) 

Query: 3 RTTRPTTDKVKGAI FNMIGPFFEGGRVLDLFSGSGSLAIEAI SRGMDQAVLVEKDRRAQV 62 

+ TRPT+DKV+GAIFNMIGP+F GGRVLDLF+GSG LAIEA+SRGM AVLVEK+R+AQ 
Sbjct: 19 KITRPTSDKVRGAIFNMIGPYFKGGRVIjDLFAGSGGIAIKWSRGMSAAVLVEKNRKAQA 78 

Query: 63 VIQENIAMTKSPEQFQLLKMEANRALEQLTGQFDLVLIDPPYAKEEIVKQIQIMDSKGLL 122 

+IQ+NI MTK+ +F LLKMEA RA++ LTG+FDLV LDPPYAKE IV 1+ + +K LL 
Sbjct: 79 IIQDNIIMTKAENRFTLLKMEAERAIDCLTGRFDLVFLDPPYAKETIVATIEALAAKNLL 138 

Query: 123 GDDIMIACETDKSVDLPEEIA3FGIWKQKIYGISKVTVYV 162 

+ +M+ CETDK+V BP+EIA+ GIWK+KIYGISKVTVYV 
Sbjct: 139 SEQVMWCETDKTVLLPKEIATLGIWKEKIYGISKVTVYV 178 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 42 

A DNA sequence (GBSx0041) was identified in S.agalactiae <SEQ ID 133> which encodes the amino acid 
sequence <SEQ ID 134>. This protein is predicted to be lipopolysaccharide core biosynthesis protein kdtB 
(kdtB). Analysis of this protein sequence reveals the following: 

Possible site: 17 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1937 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:HAB13272 GB:AP001119 lipopolysaccharide core biosynthesis 
protein kdtB [Buchnera sp. APS] 
Identities = 56/149 (37%) , Positives = 94/149 (62%) 

Query: 1 MTKKALFTGSFDPVTNGHLDIIERASYLFDHVYIGLFYNLEKQGYFSIECRKKMLEEAIR 60 

M K A++ G+FDP+T GHLDII RA+ +FD + I + N K+ F+++ R ++ + 
Sbjct: 1 MNKTAIYPGTFDPITYGHLDIITRATKIFDSITIAISKNFTKKPIFNLKERIELTRKVTIj 60 

Query: 61 QFKNVSVLVAQDRLAVDIJUlEVGAKYFVRGI^SQDFDYEANLEFFNKQIJfflDIETVYLS 120 

KNV ++ + L +LA++ A +RG+R DFDYE L NKQ+ D+++++L 
Sbjct: 61 HLKNVKKILGFNDLIiANLAKKEKANILIRGVRTIFDFDYEIKLAAINKQIYPDLDSIFLL 120 

Query: 121 TSPSLSPISSSRIRELIHFKASVKPFVPK 149 

+S +S 1SSS ++E+ +K +KP++PK 
Sbjct: 121 SSKEVSFISSSFVKEIAKYKGDIKPYLPK 149 

A related DNA sequence was identified in S.pyogenes <SEQ ID 135> which encodes the amino acid 
sequence <SEQ ID 136>. Analysis of this protein sequence reveals the following: 
Possible site: 61 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1862 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 88/161 (54%) , Positives = 124/161 (76%) 

Query: 1 MTKKALFTGSFDPVTNGHLDIIERASYLFDHVYIGLFYNLEKQGYFSIECRjqCMLEEAIR 60 

+TK L+TGSFDPVTNGHLDI++RAS LFD +Y+G+F N K+ YF +E RK ML +A+ 
Sbjct: 2 LTKIGLYTGSFDPVTNGHLDIVKRASGLFDQIYVGIFDNPTKKSYFKLEVRKAMLTQALA 61 

Query: 61 QFKNVSvLVAQDRIAVDLAREVGAKYFVRGLRI^SQDFDYEANLEFFNKQIADDIElVYLS 120 

F NV V+ + +RLA+D+A+E+ + +RGLEN+ DF+YE NLE+FN LA +IETVYL 
Sbjct: 62 DFIWIVVTSHERIAIDVAKELRVTHLIRGLRI^ATDFEYEENLEYETEL^PNIETVYLI 121 

Query: 121 TSPSLSPISSSRIRELIHFKASvKEFVPKSWREVEKMSEE 161 

+ +SSSR+RELIHF++S4+ VP+SV+ +VEKM+E+ 

Sbjct: 122 SRNKWQALSSSRWELIHFQSSLEGLVPQSVIAQVEKMNEK 162 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 43 

A DNA sequence (GBSx0042) was identified in Kagalactiae <SEQ ID 137> which encodes the amino acid 
sequence <SEQ ID 138>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1126 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 44 

A DNA sequence (GBSx0043) was identified in S.agalactiae <SEQ ID 139> which encodes the amino acid 
sequence <SEQ ID 140>. Analysis of this protein sequence reveals the following: 

Possible site: 25 



Final Results 

bacterial membrane Certainty=0 . 5416 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB13378 GB:Z99111 ylbL [Bacillus subtilis] 
Identities = 124/344 (36%), Positives = 199/344 (57%), Gaps = 21/344 (6%) 

Query: 20 WIIGFAFLLLVLASLVWLPYYLEMFGGATOIRSVLIWNKKADKAKGSYNFVAVSVSQAT 79 

W++ L+ VL+ ++LPYY+ PG A ++ S++KV + KGS + + V V A 

Sbjct: 9 WM1jVILILIAVLS--FIKLPYYITKPGEATELASLIIWEGGYPE-KGSLSLMTVICVGPAN 65 

Query. 80 PAQVLYAWLTPFTEL SSKEETTGGFSNDDYLRIMQFYMETSQNESIYQALKLANKQ 135 

P ++A + P+ E+ S KEE G S+ +Y++ M++SQ ++ A + A K+ 

Sbjct: 66 PFTYVWAKMHPYYEIVPDESIKEE GESDKEYMKRQLQMMKSSQENAVIAAYQKAGKK 122 

Query: 136 VSLTYKG\nA7LNLAKNSTFKDRLHLADTTCGVNGKSFKNSSQLIKYVAALHLGDKVKVQY 195 

VS ++ G+Y ++ +N K ++ + D + +GK+++++ +LI Y+++ GDKV ++ 
Sbjct: 123 VSYSFNGIYASSVVEIMPAKGKIEVGDiaiSADGKI^QSAEKLIDYISSKKAGDKVTLKI 182 



Query: 254 IYDQLVKEDLRKGRKIAGTGTIEQNGHV 

IY+QL K D KG IAGTGTI+ +G VG IGG KW+A K G DIFF PN N 
Sbjct: 243 IYNQLTKPDETKGYDIAGTGTIDVTX3KVGPIGGIDQKWAADKAGKDIFFAPNQNGASN- 301 
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Query: 314 KKGKTKVQTNYQEAKMAKRLGTKMKIVPVQNVQQAIDYLKKTK 357 

++Y+ A AK + + MKXVPV +Q AIDYL K K 
Sbjct: 302 --- SDYKNAVKTAKDIDSNMKIVPVDTMQDAIDYLKKLK 337 



A related DNA sequence was identified in S.pyogenes <SEQ ID 141> which encodes the amino acid 
sequence <SEQ ID 142>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-10.24 Transmembrane 10 - 26 ( 6 - 34) 

Final Results 

bacterial membrane Certainty=0 . 5097 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB13378 GB:Z99111 ylbL [Bacillus subtilis] 
Identities = 132/348 (37%), Positives = 198/348 (55%), Gaps - 16/348 (4%) 

Query: 1 MKRLKKIKWWLVGLLALISLLIALFFPLPYYIEMPGGAYDIRTVLQVNGKEDKRKaa,YQF 60 

M R K W LV +L LI++L F LPYYI PG A ++ ++++V G + KG+ 
Sbjct: 1 MLRKKHFSWMLV- ILILIAVLS- - FIKLPYYITKPGEATELASLIKVEGGYPE-KGSLSL 56 

Query: 61 VAVGISRASLAQLLYAWLTPFTEISTAEDTTG-GYSDADFLRINQFYMETSQNAAIYQAL 119 

+ V + A+ ++A + P+ EI E G SD ++++ M++SQ A+ A 

Sbjct: 57 MTVKVGPANPFTYVWAKMHPYYEIVPDESIKEEGESDKEYMKRQLQMMKSSQENAVIAAY 116 

Query: 120 SI^GKPVTLDYKGvYVLDvNNESTFKGTLHIJUJTVTGVNGKQFTSSAELIDYVSHLKLGD 179 

AGK V+ + G+Y V KG + + D + +GK + S+ +LIDY+S K GD 

Sbjct: 117 QKAGKKVSYSFNGIYASSWENMPAKGKIEVGDKIISADGKNYQSAEKLIDYISSKKAGD 176 

Query: 180 EVTVQFTSDNKPKKGVGRIIKLKN--GKNGIGIALTDHTSVNSEDTVIFSTKGVGGPSAG 237 
+VT++ + K K+ + + + + GIG++L +V E + F + +GGPSAG 

Query: 238 LMFTLDIYDQITKEDLRKGRTIAGTGTIGKDGEVGDIGGAGLKWAAAEAGADIFFVPNN 297 

LM +L+IY+Q+TK D KG IAGTGTI DG+VG IGG KWAA +AG DIFF PN 
Sbjct: 237 LMMSLEIYNQLTKPDETKGYDIAGTGTIDVDGKVGPIGGIDQKWAaDKAGKDIFFAPNQ 296 

Query: 298 PVDKE I KKVNPNAISNYEEAKRAAKRLKTKMKI VPVTTVQEAIiVYLRK 345 

N + S+Y+ A + AK + + MKIVPV T+Q+A+ YL K 
Sbjct: 297 NGASNSDYKNAVKTAKDIDSNMKIVPVDTMQDAIDYMJK 335 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 229/339 (67%) , Positives = 276/339 (80%) 





17 


Sbjct: 


7 




77 


Sbjct: 


67 




137 


Sbjct: 


127 




197 


Sbjct: 


187 




257 



+A+ AQ+LYAWLTPFTE+S+ E+TTGG+S+ D+LRINQFYMETSQN +IYQAL LA K V 



+L YKGVYVL++ STFK LHLADTVTGVNGK F +S++LI YV+ L LGD+V VQ+T 



S K K+ VG++IKL NGKNGIGI LTDHT V S+ V F+T+GVGGPSAGLMFTL I YD 



KEDLRKGR IAGTGTI ++G VGDIGGAGLKW+AA+ G 
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Sbjct: 247 QITKEDLRKGRTIAGTGTIGKDGEVGDIGG&GLKOTAAaEAGMIFFVPNNPVDKEIKKV 306 



Query: 317 KTKVQTNYQEAKAAAKRLGTKMKIVPVQNVQQaiDYLKK 355 
+NY+EAK AAKRL TKMKIVPV VQ+A+ YL+K 
5 Sbjct: 307 NPN&ISNYEEAKRAAKRLKTKMKIVPvTTVQEALWLRK 345 



A related GBS gene <SEQ ID 8479> and protein <SEQ ID 8480> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
10 McG: Discrim Score: 8.26 

GvH: Signal Score (-7.5): -4.04 

Possible site: 25 
>» Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -11.04 threshold: 0.0 
15 INTEGRAL Likelihood =-11.04 Transmembrane 20 - 36 ( 12 - 43) 

PERIPHERAL Likelihood =4.51 70 
modified ALOM score: 2.71 



• Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 5416 (Affirmative) . 

- Certainty=0. 0000 (Not Clear) < f 
• Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the databases: 

GP| 5531383 | putative secreted protein {streptomyces coelicolor A3 (2)} Insert characterized 

PIR|T36157|T3S157 probable secreted protein - Streptomyces coelicolor Insert 
characterized 

ORF01344(361 - 1362 of 1671) 

GP|5531383|emb|CAB51015.l| |AL096852(13 - 247 of 259) putative secreted protein 
{Streptomyces coelicolor A3 (2) } PIR|T36157|T36157 probable secreted protein - Streptomyces 
coelicolor 
%Match =7.1 

%Identity =38.4 %Similarity =57.6 

Matches = 58 Mismatches = 61 Conservative Sub.s = 29 



EKWRK*VICNRDPKRKHKSLLGLLKWWIIGFAFLLLVLASLVWLPYYLEMPGGAYDIRSVLKVNKKADKAKGSYNFV~- 
I : |: : |: :| ||: s |] |: 
MLSRLTRPQFLAVCGLPWALLATALFAPLPFSVAQPC-LTADV- - 



1002 

DVPV 

II I 



11111111=1=1 I 1=1 II 1= =111111 I II =11 II =1=1 =1=11 
KLRLEDVGGPSAGLLFSLGIVDKLGAGDLTGGKWAGTGTITDGGKVGAVGGVPLKTQAARRDGATVFLVPK-- 



1272 1302 1332 1362 1392 1422 1452 1482 

KGKTKVQTNYQFAKAAA[OlLGT™KIVPV0NVQQAIDYLKKTK*TQRVMSflRLFCFATFDYQSAKMIV*QSL*EYYI*M 
I I I ====11 == 1=1 II = = 

AECSDAQAELPKGLRLIPVTTLEGAVDSLKALESGKGDVPAC 

220 230 240 250 



SEQ ID 8480 (GBS39) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 12 (lane 9; MW 65.2kDa) and Figure 15 (lane 3; MW 40kDa). 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 45 

A DNA sequence (GBSx0044) was identified in S.agalactiae <SEQ ID 143> which encodes the amino acid 
sequence <SEQ ID 144>. This protein is predicted to be UDP-sugar hydrolase. Analysis of this protein 
sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3908 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MTELIRILHLNDLHSHFENFPKVKRFFH DNQAQPIETI SLDLGDNIDKSHPLTEAS 5S 

M E +R+ H NDLHSHFEN+PK+ + ++Q+ ET+ D+GD++D+ +TEA+ 

Sbjct: 1 MKEKLRLYHTNDLHSHFENWPKIVDYIEQKRKEHQSDC3EETLVFDIGDHLDRFQFVTEAT 60 

Query: 57 SGKANVQLMNELGIELATIGNNEGVGLSKKDLDQVYKDSDFTVIVGNLKD-1IIIEPSWAK 115 

GKANV L+N L 1+ A IGNNEG+ L ++L +Y ++F VIV Ei D S PSWA 
Sbjct: 61 FGKANVDLIMlLHIDGAAIGl^GITLPHEEIAAr,YDHAEFPVIVSNLFDKNGNRPSWAV 120 

Query: 116 PYIIYETQQGTKIAFIAYTFPYYKTYEPNGWTIEDPIDCLKCHLQINEIK-FANCRILMS 174 

PY I + G +AFL T PYY Y+ GWT+ D ++ +K I E+K +A+ +L+S 
Sbjct: 121 PYHIKSLKNGMSIAFLGVWPYYPVYDKLGWTVTDAI^SIK--ETILEVKGQADIIVLLS 178 

Query: 175 HLGIRFDTRIAQEFSEIDLIIGAHTHHLFEEGELIKGTYLAAAGKYGRFVGSIDITFDNH 234 

HLGI D +A+ EID+I+ +HTHHL E+G+++NG LA+A KYG +VG ++IT D+ 
Sbjct: 179 HLGILDDQAVAEAVPEIDVILESHTHHLLEDGQVWGVLLASAEKYGHYVGCVEITVDS- 237 



40 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 46 

A DNA sequence (GBSx0045) was identified in S.agalactiae <SEQ ID 145> which encodes the amino acid 
45 sequence <SEQ ID 146>. This protein is predicted to be UDP-sugar hydrolase. Analysis of this protein 
sequence reveals the following: 

Possible site: 44 

»> Seems to have no N-terminal signal sequence 
50 INTEGRAL Likelihood = -0.48 Transmembrane 5 - 21 ( 5-21) 

Final Results 

bacterial membrane Certainty=0 . 1192 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

55 bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9605> which encodes amino acid sequence <SEQ ID 9606> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

5 >GP:CAB15227 GB:Z99120 similar to hypothetical proteins [Bacillus subtilis] 

Identities = 29/137 (21%), Positives = 71/137 (51%), Gaps = 13/137 (9%) 

Query: 3 AMLFYAGaDVAIINSGLIVQPFEKD-FSRKNLHESLPHQMRIAKLTVSSQELLEIYETiy 61 
A+ + D++++NSG+I+ P + ++ +IM PH + + ++ +EL E ++ 
10 Sbjct: 305 ALKEWCETDISMVNSGVILGPLKAGPVTKLDLKRICPHPINPVAVRLTGEELKETI-- VH 362 

Query: 62 QQGQFIAQQKIHGMGFRGKCFGEVLHSGFDYKK GKIVYNEKDIDAKEEVI 111 

+ + Q +1 G+GFRG+ G+++++G 4 + +1 N +DI+ ++ 

Sbjct: 363 AASEQMEQLR1KGLGFRGEVMGKMVYAGVEV3TKRLDDGITHVTRITLNGEDIEKHKQYS 422 

15 

Query: 112 LVIVDQYYFASYFECLK 128 
Sbjct: 423 VAVLDMFTLGKLFPLIR 439 

20 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 47 

A DNA sequence (GBSx0046) was identified in S.agalactiae <SEQ ID 147> which encodes the amino acid 
25 sequence <SEQ ID 148>. This protein is predicted to be unnamed protein product. Analysis of this protein 
sequence reveals the following: 

Possible site: 29 

■»> Seems to have no It-terminal signal sequence 

30 

Final Results 

bacterial cytoplasm Certainty=0. 3567 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 

The protein differs from AX026665 at the C-terminus: 

Query: 181 SAKQHFVIRKK 191 

SAKQH + +K 
Sbjct: 181 SAKQHLLFVRK 191 

40 

A related DNA sequence was identified in S.pyogenes <SEQ ID 149> which encodes the amino acid 

sequence <SEQ ID 150>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

45 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3974 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 110/205 (53%) , Positives = 147/205 (71%) , Gaps = 15/205 (7%) 
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Query: 1 MRKEVTPEMLNYNKYPGPQFIHFENIVKSDDIEFQLVINEKSAFDVTVFGQRFSEILLKY 60 

M+KE++PEM NYNK+PGP+FIHFE VK++ 1+ L+ + K+AFD T FGQR++E+LLKY 
Sbjct: 9 MKKEISPEMYNYNKFPGPKFIHFEEQVKAEGIDLLLLEDVKNAFDTTSFGQRYTEVLLKY 58 

Query: 61 DFIVGDWGNEQLRLRGFYKDASTIRKNSRISRLEDYIKEYCNFGCAYFVLENPNPRDIKF 120 

D+IVGDWGNEQLRL+GFYKD+ I+K +RISRLEDYIKE+CNFGCAYFVLEN +P+DIKF 
Sbjct: 69 DYIVGDWGNEQLRLKGFYKDSDDIKKTNRISRLEDYIKEFCNFGCAYFVLENLHPQDIKF 128 



Query: 167 QERHIKEEQDKEMTSAKQHFVIRKK 191 

+E+ K Q ++ + HF+IRKK 
Sbjct: 189 KEKS-KRNQTSQLNTKISHFIIRKK 212 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 48 

A DNA sequence (GBSx0047) was identified in S.agalactiae <SEQ ID 151> which encodes the amino acid 
sequence <SEQ ID 152>. Analysis of this protein sequence reveals the following: 
Possible site: 32 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3627 (Affirmative) < suco 

bacterial membrane Certainty-0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) <: suco 

A related GBS nucleic acid sequence <SEQ ID 9607> which encodes amino acid sequence <SEQ ID 9608> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB06225 GB:AP001515 unknown conserved protein [Bacillus halodurans] 
Identities = 205/349 (58%) , Positives = 258/349 (73%) , Gaps = 5/349 (1%) 

Query: 18 PSIYSLTRDELIAWAIEHGEKKFRASQIWDWLYKKRVQSFDEM1NISKDFIALLNENFW 77 

PS1Y+L +EL W E GE KFRA+QI++WLY+KRV+ F EMTN+SKD A L ++F + 
Sbjct: 17 PSIYTLQFEELEMWLKEQGEPKFRATQIFEWLYEKRVKQFQEM1NLSKDLRAKLEKHFNL 76 

Query: 78 NPLKQRIVQESADGTVKYLFELPDGMLIETVLMRQHYGLSVCVTTQVGCNIGCTFCASGL 137 

LK Q+S+DGT+K+LFEL DG IETV+MR +YG SVCVTTQVGC +GCTFCAS L 
Sbjct: 77 TTLKTVTKQQSSIXSTIKFLFEkHDGYSIETVV^^ 136 

Query: 138 IKKQRDLNNGEITAQIMLVQKYFDERGQGERVSHI WMGIGEPFDNYTNVLKFLRTVNDD 197 

+R+L GEI AQ++ Q+ DE QGERV 1WMGIGEPFDNY ++ FL+TVN D 
Sbjct: 137 GGLKRNLEAGEIVAQWEAQRAMDE- -QGERVGSIVVMGIGEPFDNYQALMPFLKTVNHD 194 

Query: 198 NGLAIGARHITVSTSGLAHKIREFANEGVQVNLAVSLHAPNNDLRSSIMRINRSFPLEKL 257 

GL IGARHITVSTSG+ KI +FA+EG+Q+N A+SIHAPN +LRS +M +NR++PL KL 
Sbjct: 195 KGLNIGARHIWSTSGWPKIYQFADEGLQINFAISIiiAPKTELRSKLMPvNRAWPLPKL 254 

Query: 258 FAAIEYYIETTNRROTFEYIMLNGV^TPENAQEIADLTKKIRKLSYVNLIPYNPVSEHD 317 

AI YYI+ T RRVTFEY + G MB E+A+ELADL K 1+ +VNLIP N V E D 
Sbjct: 255 MDAIRYYIDKTGRRVTFEYGLFGGEITOQWHAES]J\DLIKDIK--CHVNLIPVNYVPERD 312 

Query: 318 QYSRSPKERVEAFYDVLKKNGTOCVVRQEHGTDIDAACGQLRSNTMKRD 366 

Y R+P++++ AF LK+ GVN +R+E G DIDAACGQLR+ K + 
Sbjct: 313 -YVRTPRDQIFAFERTLKERGvNVTIRREQGHDIDAACGQLRAKERKEE 360 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 153> which encodes the i 
sequence <SEQ ID 154>. Analysis of tliis protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2320 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 



VNPLKQR+VQESADGTVKYLFELPDGML I ETVLMRQHYG SVCVTTQVGCNIGCTFCASG 



LIKKQRDLN+GEITAQIMLVQKYFD+R QGERVSH+WMGIGEPFDNY NV+ FLR +ND 



DNGLAIGARHITVSTSGLAHKIR+FANEGVQVNLAVSLHAPNNDLRSSIMR+NRSFPLEK 



LF+AIEYYIE TNRRVTFEYIMLN VND+ + AQELADLTK IRKLSYVNLIPYNPVSEH 



DQYSRSPKERV AFYDVLKKNGVNCWRQEHGTDIDAACGQLRS TMK+DR+K 
DQYSRSPKERVLAFYDVLKKNGVNCWRQEHGTDIDAACGQLRSKTMKKDREK 3 54 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
40 vaccines or diagnostics. 

Example 49 

A DNA sequence (GBSx0048) was identified in S.agalactiae <SEQ ID 155> which encodes the amino acid 
sequence <SEQ ID 1 56>. This protein is predicted to be VanZF. Analysis of this protein sequence reveals 
the following: 
45 Possible site: 47 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.61 Transmembrane 86 - 102 ( 77 - 106) 

INTEGRAL Likelihood ='-8.60 Transmembrane 19 - 35 ( 15 - 42) 

50 INTEGRAL Likelihood = -5.15 Transmembrane 113 - 129 ( 109 - 134) 

Final Results 

bacterial membrane Certainty=0. 4843 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

55 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF36806 GB:AF155139 VanZF [Paenibacillus popilliae] 





Identities 


15 


Query: 


17 




Sbjct: 


2 


20 


Query: 


77 




Sbjct: 


62 




Query: 


137 


25 


Sbjct: 


122 






197 


30 


Sbjct: 


182 






257 




Sbjct: 


242 


35 




317 




Sb j ct : 


302 
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Identities = 45/154 (29%) , Positives = 68/154 (43%) , Gaps = 36/154 (23%) 

Query: 17 RRFVWMLVIIYCLIIVRMCFGPQIMIEGVSTPNVQRFGRIVaL LVPFNSFRSL 69 

R F+W+ V ++ L +V M G NV GR L L+PF+S 
5 Sbjct: 36 RHFLWVYVFLFYLALVYMMTC IGNVWWGRYETLIRVSEINLLPFSS 82 

Query: 70 DQLTSFKEIFWIGQWVNILLLFPLIIGLLSLKPSLRKYKSVILIAFLMSIFIECTQW 129 

+ +T++ ++NI+L PL L ++ P R K+ F S+ IE TQ++ 

Sbjct: 83 EGVTTY IMIILFMPLGFLLPTIWPQFRTIKNTACTGFFFSLAIELTQLL 132 

10 

Query: 130 LDILIDANRVFEIDDLWTNTLGGPFALWTYRNIK 163 

+R+ +IDDL NTLG YR K 

Sbjct: 133 NHRITDIDDLLMNTLGAIIGYLLYRAFK 160 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example SO 

A DNA sequence (GBSx0049) was identified in S.agalactiae <SEQ ID 157> which encodes the amino acid 
sequence <SEQ ID 158>. This protein is predicted to be multidrug resistance-like ATP-binding protein mdl. 
Analysis of this protein sequence reveals the following: 

Possible site: 30 



Seems to have no N-terminal signal sequence 
OTEGRAL Likelihood = 
INTEGRAL Likelihood = 
:NTEGRAL Likelihood = 
CNTEGRAL Likelihood = 
INTEGRAL Likelihood = 
Likelihood = 



18 - 

Transmembrane 24 7 - 
Transmembrane 160 - 
Transmembrane 141 - 
Transmembrane 56 - 
Transmembrane 278 - 



158 - 176 
134 - 158 



277 - 294 



- Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



•- Certainty=0. 3718 (Affirmative) < succ; 
•- Certainty=0. 0000 (Not Clear) < sv.cc> 
■- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MSIIKNLWWFFKEEKKRYLIGILSLSLVAVIjNLIPPKIMGSVIDAITTGKLTRPQLLWNL 60 

M + +LWWFFK+EKK Y GI+ L++V++L L+PP+++G ++D I G LT P LL + 
Sbjct: 1 MKVFVDLW'JFFKQEKKSYGFGIVMLAIVSLLTLVPPRWGIIVDHIYEGTLTMPVLLQWI 60 

Query: 61 LGLVlSALAMYGLRYIWRMYILGTSYKLGQVWYRLFEHFTKMSPSFYQKYRTGDIjMAHA 120 

L AL +Y RY+WR+ I G S +L +++R +L+ HFT M+ FYQK+RTGDLMAHA 
Sbjct: 61 GVLAALALIVWARYLWRVMIFGASLRUOU^^ 120 

Query: 121 TOTINSLTRLAGGGVMSAVDASITALTCLIWFTISKQMTLIAVIPLPLMALATSKLGR 180 

TNDI ++ AG GV++ VD+ ++TM TISW++TLI+++P+PLMAL TS G 

Sbjct: 121 TM)IRAIQATAGQGVLTLVDSLTMGGFVILTMAITISWELTLISLLPMPLMALLTSYYGS 180 

Query: 181 KTHETFICESQAAFSEIOTKVQESVSGVKVTKSFGYQEQEIASFQEWQMTFVKNMRTMTY 240 

H+ F +QAAFS LN+KVQESV+GV+VTK+FG +EQ+I +F++ + KN+ 
Sbjct: 181 LLHKRFHHAQAAFSSLNDICVQESVTGWVTKAFGQEEQDIEAFRKQSDDVVKKNVAVARV 240 

Query: 241 DvMFDPLVLLFIGASYVLTLAMGAFMISKGQVTvGDLVTFVTYLDMLVWPLMAIGFLFNM 300 

D +FDP + L +G SY L + GA + Q+T+G L +F YL +L+WP++A GFLFN+ 
Sbjct: 241 DALFDPTISLIVGLSYFLAIVFGARFVIAEQLTIGQLTSFTIYLGLLIWPMLAFGFLFNI 300 
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Query: 301 VQRGSVSYNRINSLLEQESDITDPLNPIRPWNGTLRYDIDFFRYDN--EETLADIHFTL 358 

V+RG SYNR++ LL+ + +ITD I G+ IDFYNE LAD+ F L 

Sbjct: 301 VERGRASYNRVSQLLQAKQEITDSRARIHVPPTGHVDVAIDQFVYPHQKEPALADVQFEL 360 

Query: 359 EKGQTLGLVGQTGSGKTSLIKLLLREHDVTQGKITLNKHDIRDYRLSELRQLIGYVPQDQ 418 

+G+TLG+VG+TG+GKT+L++LL RE+D+ QG I L+ I Y L L+ G VPQD 
Sbjct: 361 SEGETLGI VGKTGAGKTTLLRLLQRE YDI KQGT 1 1 LDGRP IEHYTLDALKAAFGTVPQDH 420 



Query: 479 RIAMSRAMILDPDILILDDSLSAVDAKTEHAIIENLKTNRQGKSTIISAHRLSAWHADL 538 

R++++RA++ +P+ILILDDSLSAVDAKTE AI+ +L+ R+GK+TI I +AHRLSA+ HAD 
Sbjct: 481 RVSIARALLANPNILILDDSLSAVDAKTEEAILSSLRAERKGKTTIITAHRLSAIKHADH 540 

Query: 539 ILVMQDGRVIERGQHQELLNKGGWYAETYASQQLE 573 

ILVM DGR++ERG H+ L+ GGWY Y QQLE 
Sbjct: 541 ILVMDDGRIVERGTHETLMEAGGWYRKMYERQQLE 575 

There is also homology to SEQ ID 8. 

A related DNA sequence was identified in S. pyogenes <SEQ ID 159> which encodes t 
sequence <SEQ ID 1 60>. Analysis of this protein sequence reveals the following: 
Possible site: 23 

»> Seems to have an uncleavable N-term signal seq 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 



176 - 192 
Transmembrane 267 - 283 
Transmembrane 18 - 34 
Transmembrane 151 - 167 
Transmembrane 85 - 101 



150 - 169: 



Final Results 

bacterial membrane Certainty=0 .4100 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 172/609 (28%) , Positives = 315/609 (51%) , Gaps = 58/609 (9%) 

Query: 1 MSIIKNLWWFFKEEKKRYL1GILSLSLVAVLNLIPPKIMGSVIDAITTGKLTRPQLLWNL 60 

M + W++FK + + + +4+ L L + P +G + + GK+ + 4- + 
Sbjct: 2 MKTARFFWFYFKRYRFSFTVIAVAVILATYLQViCAPVFLGESLTEL--GKIGQAYYVAKM 59 

Query: 61 LGLV LSAL- -AMYGLRYIWRMYILGT SYKLGQW RYRLFEHFTKM 103 

G LSA M+ L + +L S+ h +W R LF ++ 

Sbjct: 60 SGQTHFSPDLSAFNAVMFKIiMTYFFTVLANLIYSFLLTRWSHSTNRMRKGLFGKLERL 119 

Query: 104 SPSFYQKYRTGDLMAHATNDINSLTRLAGGGVMSAVDASITALVTLITMFFTISWQM 160 

+ +F+ +++ G++++ T+D+++ + ++++ S+ +VT I ++ + W M 

Sbjct: 120 TVAFFDRHKDGEILSRFTSDLDN IQNSLNQSLIQWTNIALYIGLVWMMFRQ 171 

Query: 161 TLIAVIPLPLMAIATS-KLGRKTHETFKESQAAFSELNNKVQESVSGVKVTKSF 213 

IA P+ L+ L + +L RK Q S LN + E++SG K 

Sbjct: 172 DSRLALLTIASTPVALIFLVINIRLARKYTNI QQQEVSALNAFMDETI SGQKAI I VQ 228 

Query: 214 GYQEQEIASF QEVNQMTFVKNMRT MTYDVMFDPLVLLFIGASYVLT-LAM 262 

GQE++F +VQTF+ + + M + + +++F+G++ VL+ +M 

Sbjct: 229 GVQEDTMTAFLIOJNERVRQATFKRRLFSGQLFPVMNGMSLINTAIVIFVGSTIVLSDKSM 288 

Query: 263 GAFMISKGQVTVGDLVTFVTYLIMLVWPOIAIGFLFNWQRGSVSYNRINSLLEQESDIT 322 

A +G +VTFV Y P+M I + +Q +RI + ++ ++ 

Sbjct: 289 PA AAALGLVVTFVQYSQQYYQPM'IQIASSWGELQLAFTGAHRIQEMFDETEEVR 342 
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Query: 323 DPtJSPIRPWWGTLRYD-IDFFRYDNEETLADIHFTLEKGQTLGLVGQTGSGKTSLIKLL 381 

P + + + +DF ++ L+D+ KG+ + +VG TGSGKT+++ L+ 

Sbjct: 343 PQNAPAFTSLKEAVAIimVDFGYLPGQKVLSDVSIVAPKGKMIAWGPTGSGKTTIMNLI 402 

Query: 382 LREHDOTQGKITMKHDIRDYRLSELRQLIGOTPQDQFLFATSILENVRFGNPTLSINAV 441 

R +DV G IT + DIRDY L LRQ +G V Q+ LF+ +1 +N+RFG+ T+S + V 
Sbjct: 403 NRFYDVDAGSITFDGRDIRDYDLDSLRQKVGIVLQESVLFSGTITDWIRFGDQTISQDMV 462 



Query: 442 
Sbjct: 463 

Query: 502 VDAKTEHAIIENLKTNRQGKSTIISAHRI.SAWHADLILVMQDGRVIERGQHQEIaLNKBG 561 

VD TE I ++ G+++ + AHRL +++AD I+V++DG+VIE+G H ELL++ G 
Sbjct: 523 VDTVTESKIQRAMEAIVAGRTSFVIAHRLKTIIiNADHIIVLKDGKVIEQGNHHELLHQKG 582 

Query: 562 WYAETYASQ 570 

+YAE Y +Q 
Sbjct: 583 FYAELYHNQ 591 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f 
vaccines or d 



Example 51 

A DNA sequence (GBSxOO50) was identified in S.agalactiae <SEQ ID 161> which encodes the amino acid 
sequence <SEQ ID 162>. This protein is predicted to be mdlB (ATP-bindingprot). Analysis of this protein 
sequence reveals the following: 

Possible site: 39 



Seems to 
INTEGRAL 


have no N-termina] 
Likelihood = -8 


signal sequence 
65 Transmembrane 


164 


180 


155 


183 


INTEGRAL 


Likelihood = -5 


15 Transmembrane 


25 


41 


21 


46 


INTEGRAL 


Likelihood = -4 


88 Transmembrane 


143 


159 


133 


163 


INTEGRAL 


Likelihood = -1 


49 Transmembrane 


251 


267 


251 


270 


INTEGRAL 


Likelihood = -1 


33 Transmembrane 


61 


77 


61 


77 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



— Certainty=0. 4461 (Affirmative) • 
•- Certainty=0. 0000 (Not Clear) < i 
•- Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB06054 ABC transporter (ATP-binding protein) [Bacillus halodurans] 
Identities = 278/582 (47%) , Positives = 398/582 (67%) , Gaps = 6/582 (1%) 

Query: 1 MMKSNQWQVFIO^LISYLRPYKWFTVIJ^SLLLLTTVVKNIIPLIASHFIDHYLT-NVNQT 59 

+ Q VFKRL+SY YK ++A LL + T + + P+I FID YLT T 
Sbjct: 9 LSSKEQRTVFKRLLSYAAHYKGQLMVAFLLLFIATGAQLLGPIIVKIFIDDYLTPRYFPT 68 

Query: 60 AVLILVG--YYSMYVLQTLIQYFGNLFFARVSYSIVRDIRRDAFANMERLGMSYFDRTPA 117 

VL L+G Y +++ +1 Y+ F +V+ SIV+ +R D F++++RLG+S+FD+TPA 
Sbjct: 69 DVLFLLGAGYLVLHLTAVIIDYYQLFLFQKVALSIVQRLRIDVFSSVQRLGLSFFDQTPA 128 

Query: 118 GSIVSRITNDTEAISDMFSGILSSFISAIFIFTVTLYTMLMLDIKLTGLVALLLPVIFIL 177 

G +VSRITNDTE+I +++ +L++F+ I M L++ L +LLP+IF L 

Sbjct: 129 GGLVSRITM3TESIKELYVTVIATWQNIIFLIGIFAAMFYLNVTLAIYCLVLLPLIFAL 188 

Query: 178 VNVYRKKSVTVIAKTRSLLSDINSKLSESIEGIRIVQAFGQEERLKTEFEEINKEHVVYA 237 

+ VYRK S A LS +N +++ESI+G+ I+Q F QE R++ EF IN EH + 

Sbjct: 189 MQVYRKYSSRFYADMSEKLSLLNGRINESI QGMAI I QMFRQERRMRKEFSAINDEHFLAG 248 

Query: 238 NRSMALDSLFLRPAMSLLKLLAYAVIjMAYFGFTGVKGGLTAGLMYAFIQYVNRLFDPLIE 297 
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+SM LD L LRPA+ +L +LR ++++YFG + + G++YAF+ Y++R F+P+ + 
Sbjct: 249 MKSMKLDGLLLRPAVDVLSILRLMLILSYFGIMSMDTAVEIGVWAFVNYLDRFFEPVNQ 308 

Query: 298 VTQNFSTLQTSMVSAGRVFDLIDETGFEPSQKHTE--AFVREGNIEFKNVSFSYDGKKQI 355 
5 + S . Q ++VSAGRVF L+D P ++ E A + EGN+EF+NVSFSYDGK + 

Sbjct: 309 ^1MMRLSMFQQAIVSAGRVFK1MDHRE1APDREGNEHPAIIGEGNVEFR1TVSFSYDGKTNV 368 

Query: 356 LDNVSFSVKKGETIAFVGATGSGKSSIINVFMRFYEFQSGQVLLDGKDIRDYSQEQLRKN 415 
L N+SF+VKKGET+A VG TGSGK+SIINV MRFY Q G++L4-DGK + + +LR 
10 Sbjct: 369 LKNISFTVKKGETVALVGHTGSGKTSIINVLMRFYPLQDGEILIDGKPLTSFENNELRAK 428 



Query: 416 IGLVLQDPFLYHGTIKSNIKMY-QDITDQEVQDAAEFVDADQFIQKLPDKYDAAVSERGS 474 

+GLVLQDPFLY GTI SNI++Y Q I+D ++ AA FV AD FI++L Y+ V+ERG+ 
Sbjct: 429 VGLVLQDPFLYTGTIASNIRLYDQAISDDRIKRAASFVRADGFIERLSHGYETKVTERGA 488 

Query: 475 SFSTGQRQLLAFARTVASKPKILILDEATANIDSETEQIVQDSLAKMRQGRTT1AIAHRL 534 

+FS+GQRQLL+FART+ +P ILILDEATA++D+ETE+ +Q++L +M+QGRTTIAIAHRL 
Sbjct: 489 TFSSGQRQLLSFARTMVREPAILILDEATASVDTETEEAIQEALERMKQGRTTIAIAHRL 548 

Query: 535 STIQDANCIYVLDRGKIIESGNHESLLDLKGTYYRMYQLQAG 576 

STI+DA+ I VL +G+I+E G H+ L+ KG Y +MY LQ G 
Sbjct: 549 STIKDADQILVLHQGEIVERGTHDELIAKKGLYQKMYVLQKG 590 



There is also homology to SEQ ID 160. 

A related GBS gene <SEQ ID 8481> and protein <SEQ ID 8482> were also identified. Analysis of tl 
protein sequence reveals the following: 



Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: -4.63 
GvH : Signal Score (-7.5): -5.85 

Possible site: 39 
>» Seems to have no N-terminal signal sequence 
ALOM program count: 5 value: -8.65 threshold: C 
INTEGRAL Likelihood = -8.65 Transmembrane 
Likelihood = -5.15 Transmembrane 
Likelihood = -4 . 88 Transmembrane 
INTEGRAL Likelihood = -1.49 
INTEGRAL Likelihood = -1.33 
PERIPHERAL Likelihood = 3.02 
modified ALOM score: 2.23 

*** Reasoning Step: 3 



Transmembrane 



• 180 ( 155 - 183) 

■ 41 ( 21 - 46) 

• 159 ( 133 - 163) 

■ 267 ( 251 - 270) 

• 77 ( 61 - 77) 



• Final Results 

bacterial n 
bacterial outside - 
bacterial cytoplasm - 



• Certainty=0. 4461 (Affirmative) ■ 

• Certainty=0. 0000 (Not Clear) < i 

• Certainty=0. 0000 (Not Clear) < I 



The protein has homology with the following sequences in the databases: 

ORF01277(322 - 2028 of 2340) 

50 EGAD|108578|BS0971(2 - 667 of 673) hypothetical protein {Bacillus subtilis} OMNI |NT01BS1137 

conserved hypothetical protein GP| 2226165 |emb|CAA74449.l| |Y14080 hypothetical protein 
{Bacillus subtilis} GP| 2633307 |emb|CAB12811.1 | ) Z99109 similar to ABC transporter (ATP- 
binding protein) {Bacillus subtilis} PIR|H69828|H69828 ABC transporter (ATP-binding 
protein) homolog yheH - Bacillus subtilis 

55 %Match =28.5 

%Identity =40.8 %Similarity =69.1 

Matches = 234 Mismatches = 171 Conservative Sub.s = 162 



RLLFQHIDYQLLCTQTLS*LCKTAESSSEVSIKSC*IKWGMLKM1PHSN*KITOKHLMKSNQWQVFKRLISYLRPYKWFT 

:: I I I h : 
MKIGKTLWRYALLYRKLL 
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VLALSLLLLTTWKNIIPLIASHFIDHYLTNVNQT A 

: |: || : : |:| || :: : :| 

ITAVLLLTVAVGAELTGPFIGKKMIDDHILGIEKTWYEAAEKDKNAVQFHGVSYV AAEKLTKQELFQFYQPEIKGM 



VLILVGYYSMYVLQTLIQYFGNLFFARVSYSIVRDIRRDAFANMERLGMSYFDRTPAGSIVSRITNDTEAISDMFSGILS 
||:: | : |: : || : :: : | : : :|:| | : ::::: : ||| ||| :|:||||||||l |=: =11 
VLLICLYGGLLVFSVFFQYGQHYLIO^SANRIIQraKQ 



750 777 807 837 867 897 927 957 

SFISAIFIFTVTLYTML-MLDIKLTGLVALLLPVIFILVNVYRKKSVTVIAKTRSLLSDINSKLSESIEGIRIVQAFGQE 
15 :|::: |: ::| | :||:|| : ::|:|:: :|| : : | ||: | | | | : | : : | | | : | : |:||| :: 

TFWS-GIYMFGIFTALFLLDVKLAFVCIAIVPIIWLKSVIYRRYASYY^QKIRSINSDINAKMNESIQGMTIIQAFRHQ 
240 250 260 270 280 290 300 



987 1017 1047 1077 1107 1131 1161 1191 

20 ERLKTEFEEINKEHVVYA1^SMALDSLFLRPAMSLLKLLAYA^7LMAYFGFTGVK--GGLTAGLMYAFIQYVNRLFDPLIE 
: ||||:|: | : || : |:|| ::::: ||; |, :|| : | :: |:;|||: |:|||| |: 

KETMREFEELffiSHFYFQNP^IMLNSLMSM 

320 330 340 350 360 370 380 

25 1221 1251 1281 1311 1341 1371 1401 1431 

OTQNFSTLQTSWSAGRVFDLIDETGFEPSQKNTEAFVREGNIEFKNVSFSYDGKKQIIJDnVSFSVKKGETIAFVGATGS 
= III:: lllllll=|:=l I = = = I :||::|||:| :::| ::||: :||||:|:|| III 

IWQFSKLELARVSAGRVFELLEEKOT'EFAGEPAKERAL-GRVEFRDVSFAYQEGEEVLKHISFTAQKGETVALVGHTGS 
400 410 420 430 440 450 460 

30 

1461 1491 1521 1551 1581 1611 1638 1668 

GKSSIINVFMRFYEFQSGQVLLDGKDIRDYSQEQLRKNIGLVLQDPFLYHGTIKSNIKMYQD-ITDQEVQDAAEFVDADQ 
llllhl:: 111= I I 11 = 111 I = I = = = I I = = 1 = 11111 = 1= III 11= = = =l = = l = = = l I 1 = 
GKSSILISn:,LFRFYDAQKGDVI,IDGKSIYNMSRQELRSHMGIVLQDPYLFOTIGSNVSLDDERMTEEBaKNALRQVGAEP 
35 480 490 500 510 520 530 540 

1698 1728 1758 1788 1818 1848 1878 1908 

FIQKLPDKYDAAVSERGSSFSTGQRQLLAFARTVASKPKILII.DEATANIDSETEQIVQDSLAKMRQGRTTIAIAHRLST 
= = = 111 = I =|| = = |:| = ||| = = ||| =| I |||||llll = ll = lll = = l =1 = = 11111 MM 
40 LLKiCLPKGINEPVIEKGSTLSSGERQLISFARALAFDPAILILDEATAHIDTETEAVIQKALDWKQGRTTFVIAHRLST 
560 570 580 590 600 610 620 

1938 1968 1998 2028 2058 2088 2118 2148 

IQDANCIYVLDRGKIIESGHHESLLDLKGTYYRMYQLQAGKMEV*KI*TIQKA*SVRFRGWSSYSSKPFLYFTISV**GQ 
45 |::|: | |||:|:|:| |||| |= | = | MIIMI I 

IKNADQILVLDKGEIVERGNHEELMALEGQYYQMYELQKGQKHSIA 
640 650 660 670 

There is also homology to SEQ IDs 330, 4634 and 5788. 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 52 

A DNA sequence (GBSx0051) was identified in S.agalactiae <SEQ ID 163> which encodes the amino acid 
sequence <SEQ ID 164>. Analysis of this protein sequence reveals the following: 

55 Possible site: 25 



»> Seems to have no N-terminal signal sequence 



Final Results 

60 bacterial cytoplasm Certainty=0. 0635 (Affirmative) < succ: 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9609> which encodes amino acid sequence <SEQ ID 9610> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:7AAA25224 GB:M87483 anthranilate synthase beta subunit 
[Lactococcus lactis] 
Identities = 101/191 (52%), Positives = 133/191 (68%), Gaps = 4/191 (2%) 

MLLLVDiraDSFTYmKQYLSVYKEVFVIKMDVPNLFLIl^SAEAIVLSPGPGHPKDAGKM 73 
M+L++DNYDSFTYNL QY+ V +V V+KND +L +AE A+A++ SPGEG P DAGKM 
MILIIDNYDSFTYl^VQYVGVLTDVAVviasroDDS 60 



Sbj< 

Sb j ct : 
Query: 
Sbji 



74 VELINQFIGKKPILGICLGHQALAECLGGRLNLANHVMHGKQSWVTINDHTSLFKGIDSP 133 

LI QF G+KPILGICLG QA+ E GG+L LA+ VMHGK S V +F + S 

61 ETLIQQFAGQKPILGICLGFQAIVEOTGGKLRLAHQVMHGKNSQVRQTSGNLIFNHLPSK 120 

134 TQVMRYHSLWTD LPENIAVIARSNEDNEIMAFHCPSLKVYAMQFHPESIGSIDGMK 190 

VMRYHS+V+ + LP+ A+ A 4 +D EIMA ++Y +QFHPESIG++DGM 

121 FLVMRYHSIVMDEAVALPD-FAITAVATDDGEIMAIENEKEQIYGLQFHPESIGTLDGMT 179 

191 MIENFLTLIND 201 

MIENF+ +N+ 
180 MIENFVNQVNE 190 



A related DNA sequence was identified in S.pyogenes <SEQ ID 165> which encodes the amino acid 
sequence <SEQ ID 166>. Analysis of this protein sequence reveals the following: 
Possible site: 57 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3183 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 104/186 (55%) , Positives = 131/186 (69%) 

Query: 14 MLLLVDl^SFTYI^KQYLSVYKEVFVIKITOVPNLFLLAESAFAIVLSPGPGHPKDAGJiM 73 

M+LL+DNYDSFTYNL QYLS + E V+ N PNL+ +A+ A A+VLSPGPG PK+A +M ' 
Sbjct: 1 MILLIDNYDSFTYMAQYLSEFDETIVLYNQDPNLYDI»1AKKAKALVLSPGPGWPKEANQM 60 

Query: 74 VELINQFIGKKPILGICLGHQALAECLGGRimiANHVMHGKQSWvTINDHTSLFKGIDSP 133 

+LI F KPILG+CLGHQA+AE LGG L LA VMHG+QS + SLF+ + 

Sbjct: 61 PKLIQDFYQTKPILGVCLGHQAIAETLGGTLRLAKRVMHGRQSTIETQGPASLFRSLPQE 120 

Query: 134 TQVMRYHSLWTDLPENIAVIARSNSDNEIMAFHCPSLKVYAMQFHPESIGSIDGMKMIE 193 

VMRYHS+W LP+ 4-V AR +D EIMAF +L ++ +QFHPESIG+ DGM MI 
Sbjct: 121 ITVMRYHSIVVDQLPKGFSVTARDCDDQEIMAFEHHTLPLFGLQFHPESIGTPDGMTMIA 180 

Query: 194 NFLTLI 199 

NF+ I 
Sbjct: 181 NFIAAI 185 



55 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



WO 02/34771 



-109- 



PCT/GB01/04789 



Example 53 

A DNA sequence (GBSx0052) was identified in S.agalactiae <SEQ ID 167> which encodes the amino acid 
sequence <SEQ ID 168>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

»> Seems to have a cleavable N-term signal seg. 

INTEGRAL Likelihood = -8.17 Transmembrane 117 - 133 ( 108 - 140) 
INTEGRAL Likelihood = -1.70 Transmembrane 150 - 166 ( 150 - 166) 



30 



10 Final Results 

bacterial membrane Certainty=0 .4270 (Affirmative) < succ 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database: 



Query: 8 YIALMVALL I VLGF I PG I PLGF I PVPIVLQNLGVMLAGALLGSRKGFLAVAI PLLLVAIG 67 

+IA+ AL+ VLGF+P + L F PVPI LQ LGVMLAG++L + FL+ +FLLLVA G 
Sbjct: 9 HIAIFTALMAVLGFMPPLFLSFTPVPITLQTLGVMLAGSILRPKSAFLSQLVFLLLVAFG 68 

Query: 68 APFLPGGRSGLVTLFGPTAGYLLTYPFAAFFIGLGLEKVKTTKLWVQFLIIMIFGVLLID 127 

AP LPGGR G FGP+AG+L+ YP A++ I L +++ + F +FG++ I 
Sbjct: 69 APLLPGGRGGFGVFFGPSAGFLIAYPLASWLISLAANRLRKVTVLRLFFTHIVFGI I FI Y 128 

Query: 128 ICGSIVLSFQTSLPLTKSLFSNLIFIPGDTLKASICLIIYRKFANRLT 175 

+ G V +F + L+++ F +L ++PGD +KA++ + K L+ 
Sbjct: 129 LLGIPVQAFIMHIDLSQAAFMSLAYVPGDLIKAAVSAFLAIKITQALS 176 



A related DNA sequence was identified in S.pyogenes <SEQ ID 169> which encodes the amino acid 
sequence <SEQ ID 170>. Analysis of this protein sequence reveals the following: 
Possible site: 51 
35 >>> Seems to have an uncleavable N-term signal seq 



Likelihood = 
INTEGRAL Likelihood - - 
integral Likelihood = - 
INTEGRAL Likelihood = - 
INTEGRAL Likelihood = - 
INTEGRAL Likelihood = - 



113 - 129 (109-3 

Transmembrane 55- 71 ( 52 - 

- 26 ( 6 - 

- 102 ( 81-1 

- 49 ( 28 - 

- 166 (150-1 



- Certainty=0 . 5012 (Affirmative) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 80/168 (47%) , Positives = 108/168 (63%) , Gaps = 1/168 (0%) 

Query: 3 TRTTTYIALMVALLIVLGFI PGI PLGFIPVPIVLQNLGVMLAGALLGSRKGFLAVAI FLL 62 

T+ +A+M L+I+LGFIP IPLGFIPVPIVLQNLGVMLAG +LG +KG L+V +F L 
Sbjct: 4 TKELVKVAMMTTLIIILGFIPAIPLGFIPVPIVLQNLGTOLAGLMLGGKKGTLSVFLF-L 62 

Query: 63 LVAIGAPFLPGGRSGLVTLFGPTAGYLLTYPFAAFFIGLGLEKVKTTKLWVQFLIIWIFG 122 

++ + P G R+ + L GP+AGY++ Y L + +FL + IG 

Sbjct: 63 VIGLFLPVFSGSRTTIPVLMGPSAGYVIAYLLVPIVFSLLYRNWFSKSTPLAFLALLISG 122 

Query: 123 VLLIDICGSIVLSFQTSLPLTKSLFSNLIFIPGDTLKASICLIIYRKF 170 

V+L+D+ G+I LS T + L SL SNL+FIPGDT+KA I II K+ 
Sbjct: 123 VVliVDVLGAIWLSAYTGMSLvTSLLSNLVFIPGDTIKa.I IAT I IAVKY 170 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 54 

A DNA sequence (GBSx0053) was identified in S.agalactiae <SEQ ID 171> which encodes the amino acid 
sequence <SEQ ID 172>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3914 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 55 

A DNA sequence (GBSx0054) was identified in S.agalactiae <SEQ ID 173> which encodes the amino acid 
sequence <SEQ ID 174>. Analysis of this protein sequence reveals the following: 
Possible site: 15 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1864 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9611> which encodes amino acid sequence <SEQ ID 9612> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB054S7 GB:AP001513 biotin synthase [Bacillus halodurans] 
Identities = 133/316 (42%) , Positives = 201/316 (63%) , Gaps = 2/316 (0%) 

Query: 17 NYIHIJffiEILSGKTSISYEQALEII^S-DENfnffiIYAJU^YLI<NQVSRNNIRLNVLLSAIC 75 

N+I LA E++ GK IS +AL ILNS D+ + A ++ ++ ij>j ++++ ak 
Sbjct: 2 NWIQIiAQEVIEGKR-ISENEALAimSPDDELLLLLQGAFTIRQTYYGKKVKLNMIMNAK 60 

Query: 76 QGLCAENCGYCSQSKESTADIDKFGLLPQNVILKQAIVAHQNGASVFCIAMSGTKPSKRE 135 

G C ENCGYCSQS S A ID + ++ + IL+ A AH+ +CI SG P+ R+ 

Sbjct: 61 SGFCPENCGYCSQSSISKAPIDAYPMVNKFjTILEGAKRAHEtNVGTYCIVASGRGPTNRD 120 

1+ + + + EIK + L+IC G L EQ QLK AG+DR NHN+NT ++ I T+H 
Sbjct: 121 IDHVTEIATOEIKDTYGIjKICACLGILKPEQAEQLKAAGVDRYNHNVNTSARHHDQITTSH 180 

Query: 196 SFKDRCDTLERIHNEDIDVCSGFICGMGESDEGLITLAFRLKELDPYSIPVNFLLAVEGT 255 

+++DR +T+E + + I CSG I GM E+ E ++ +AF+L+ELD SIPVNFL A++GT 
Sbjct: 181 TYEDRVNTVEVVKHSGISPCSGVIVGMKETKEDVVDMAFQLRELDADSIPVNFLHAIDGT 240 
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Query: 256 PLGKYNYLTPIKCLKINffiMLRFVFPFKEIJ^SAGREVHFEl^ESLVTLLVDSTFLGNYLT 315 

PL + LTPI CLK++++ R+V P KE+R+S GREV+ ++ + h +S F+G+YLT 

Sbjct: 241 PLQGVHELTPIYCLKVLSLFRYVCPTKEIRISGGREVNLKSLQPLGLYAANSIFIGDYLT 300 

5 Query: 316 EGGRNQHTDIEFLEKL 331 

G+ + D + L+ L 
Sbjct: 301 TAGQEETADHQILKDIi 316 

No corresponding DNA sequence was identified in S.pyogenes. 

10 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 56 

A DNA sequence (GBSx0055) was identified in S.agalactiae <SEQ ID 175> which encodes the amino acid 
sequence <SEQ ID 176>. Analysis of this protein sequence reveals the following: 

15 Possible site: 24 

»> Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm --- Certainty=0. 3440 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9613> which encodes amino acid sequence <SEQ ID 9614> 
25 was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

30 Example 57 

A DNA sequence (GBSx0056) was identified in S.agalactiae <SEQ ID 177> which encodes the amino acid 
sequence <SEQ ID 178>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

35 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1985 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
45 vaccines or diagnostics. 
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Example 58 

A DNA sequence (GBSx0057) was identified in S.agalactiae <SEQ ID 179> which encodes the amino acid 
sequence <SEQ ID 180>. Analysis of this protein sequence reveals the following: 

Possible site: 32 



Final Results 

bacterial membrane Certainty=0 . 1044 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAC11722 GB:AL445064 acetyl-CoA acetyltransf erase related 
protein [Thermoplasma acidophilum] 
Identities = 113/388 (29%), Positives = 181/388 (46%), Gaps = 31/388 (7%) 

RDWIGFGLRTPIGIKGKQFKHYR-PELLGAHLLNQIKKIESESNID SIICGNTV 57 

RDV+I RT IG G+ F + P+L GA IK + E+++D +1 GN + 

RDVFIVAAKRTAIGKFGRSFSKLKAPQLGGA AI KAVMDEAHVDPAS VEEVIMGNVI 57 

- -GTGGNIGRLMTLFSDYESYIPVQTIDMQCASSSSALFFGYLKISTGINEKVLVGGIES 115 
G G N + + T+++ CAS A+ +1+ G + V+ GG+ES 

Sbjct: 58 QAGNGQNPAGQAAFHGGLPNSVLKYTVNWCASGMLAVESAAREIALGERDLVIAGGMES 117 



Query: 




Sbjct: 


2 


Query: 


58 . 


Sbjct: 


58 


Query: 


116 


Sbjct: 


118 


Query: 


166 


Sbjct: 


178 


Query: 


225 


Sbjct: 


238 






Sbjct: 


298 




337 


Sbjct: 


358 



M D+ + S++RA+ A + G + 1+ EG+ D+G+RK 



Query: 225 IGNVCLMHDAAAFLTLQSQKT--EFRIVHIVEVAG- - 



ALK+++ GL + GG +++E 
NALKHRHLKTGLATIiCHGGGGAHTLTLE 385 

A related DNA sequence was identified in S. pyogenes <SEQ ID 181> which encodes the amino acid 
sequence <SEQ ID 182>. Analysis of this protein sequence reveals the following: 

Possible site: 22 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.28 Transmembrane 345 - 361 ( 345 - 361) 



Final Results 

bacterial membrane Certainty=0. 1510 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB03328 GB:AB035449 acetyl-CoA c- acetyltransf erase 
[Staphylococcus aureus] 
Identities = 115/382 (30%), Positives = 184/382 (48%), Gaps = 29/382 (7%) 
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Query: 1 MTDVYIAAGLRTPIGLVGKQFAKEQPE:LGAKLINALQKKYPV---PIDQVICGNTVGTG 57 

M I A RT G G 4PE L L + KYP ID V+ GN VG G 

Sbjct: 1 MNQAVIVAAKRTAFGKYGGTLKHLEPEQLLKPLFQHFKEKYPEVISKIDDWLGNWGNG 60 

5 

Query: 58 GNIGRLMTLYSHLGESVSALTVDMQCASAGAALSVGYAKIKAGMASNLLVGGIESSS- -- 114 

GNI R L + L +S+ +T+D QC S ++ I+AG + GG+ES+S 

Sbjct: 61 GNIARKALLEAGLKDSIPGVTIDRQCGSGLESVQYACRMIQAGAGIWYIAGGVESTSRAP 120 

10 Query: 115 ---LQPESVYASADWRQGAYICVAQF3PDSISPFAMIEGAERVAREHGFTKEYLNHWTLRS 171 

+P SVY +A Y+ A F+P+ P +MI+GAE VA+ + ++E 4 4 RS 

Sbjct: 121 WKI KRPHSVYETA- - LPEFYERASFAPEMSDP - SMI QGAENVAKMYDVSRELQDEFAYRS 177 

Query: 172 HQKASYCQEQALLADLILDIjSGA SDQGIRPRLSSKVLSKVPPILGEGHVISAANA 226 

15 HQ + + +4 IL ++ +D+ 44 4 + P++ +G ++AAN+ 

Sbjct: 178 HQLTAENVKNGNISQEILPITVKGEIFNTDESLKSHIPKDNFGRFKPVI-KGGTVTAANS 236 

Query: 227 CLTHDAAAFLQLSSQPSAFKL ID WEVAGDPQRS PLMVI KASQVLLEKHGLG 278 

C+ +D A L + + A++L D V V D + + A LL+++ h 

20 Sbjct: 237 CMK1TOGAVLLLIMEKDMAYELGFEHGLLFKDGVTVGVDSNFPGIGPVPAISNLLKRNQLT 296 

Query: 279 MADMTAIEWNEAFAVIDGLFETHYPDLLBRYNI FGGALAYGHPYGASAAI 1 1 LHLMRALE 338 

4 ++ IE NEAF+ + + NI+GGALA GHPYGAS A ++ L + 

Sbjct: 297 IENIEVIEINFAFSAQWACQQALNISNTQLJJIWGGALASGHPYGASGAQLVTRLFYMFD 356 

25 

Query: 339 I KTTGRYGIAAI AAAGGQGFAVL 360 

+ IA++ GG GAL 

Sbjct: 357 KET MIASMGIGGGLGNAAL 375 

30 An alignment of the GAS and GBS proteins is shown below: 

Identities = 182/362 (50%), Positives = 243/362 (66%), Gaps = 2/362 (0%) 

Query: 5 DVYIGFGLRTPIGIKGKQFKHYRPELLGAHLLNQIKKIESESNIDSIICGHTVGTGGNIG 64 
DVYI GLRTPIG+ GKQF +PE+LGA L+N ++ 4 ID 4ICGNTVGTGGNIG 
35 Sbjct: 3 DVYIAAGLRTPIGLVGKQFAKEQPEILGAKLIMALQN-ICYPVPIDQVICGNTVGTGGNIG 61 

Query: 65 RLMTLFSDYESYIPVQTIDMQCASSSSALFFGYLKISTGINEKVLVGGIESSSLQPMRRY 124 

RLMTL4S 4 T4DMQCAS4 4AL GY KI G4 4LVGGIESSSLQP Y 

Sbjct: 62 RLMTLYSHLGESVSALTVDMQCASAGAALSVGYAKIKAGMASNLLVGGIESSSLQPESVY 121 

40 

Query: 125 AKEDNT^NGEYTVAQFSPDSYAETVMLEGAQRVCQKYGFRREMLDKLAFLSHKRALTAKQG 184 

A D R G Y VAQFSPDS + M4EGA4RV 444GF 4E L4 SH44A +4 

Sbjct: 122 ASADWRQGAYKVAQFSPDSISPFAMIEGAERVAREHGFTKEYIiNHWTLRSHQKASYCQEQ 181 

45 Query: 185 GYLEEVILPMEGMRDQGVR-KLKETFFQKLPRLMENSPLLTIGNVCLMHDAAAFLTLQSQ 243 

L ++IL 4 G DQG4R +h K4P 44 444 N CL HDAAAFL L SQ 

Sbjct: 182 ALLADLILDLSGASDQGIRPRLSSKVLSKVPPILGEGHVISAANACLTHDAAAFLQLSSQ 241 

Query: 244 KTEFRIWITOVAGDPKLSPELVHTATEKLLTETHTKISDYDAIEWNEPFAAIDALFNHY 303 
50 4 F444 4VEVAGDP4 SP 4V A44 LL 4 44D AIEWNE FA ID LF 4 

Sbjct: 242 PSAFKLID^/VEVAGDPQRSPLWIKASQV^LEOGLGXADMTAIEWNEAFAVIDGLFETH 301 

Query: 304 YPEEREKFKIFGGTLAYGHPYACSGIINILHLMQALKYKNKPMGLTAIAGAGGVGMAISIEY 365 
YP4 444NIFGG LAYGHPY S I ILHLM4AL4 KN G4 AIA AGG G A4 44Y 
55 Sbjct: 3 02 YPDLIiDRYNI FGGAIiAYGHPYGASAAI I ILHLMRALEI KNGRYGI AAI AAAGGQGFAVLLKY 363 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 59 

60 A DNA sequence (GBSx0058) was identified in S.agalactiae <SEQ ID 183> which encodes the amino acid 
sequence <SEQ ID 184>. Analysis of this protein sequence reveals the following: 

Possible site: 13 
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149 - 165 ( 148 - 165) 

Final Results 

bacterial membrane Certainty=0. 2529 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12876 GB:Z99109 similar to long-chain f atty-acid-CoA ligase 
[Bacillus subtilis] 
Identities = 90/382 (23%), Positives = 158/382 (40%), Gaps = 24/382 (6%) 

Query: 47 ISTHSLLNQLVRFVSKLCQKALPI I CKPNLTHNE I SRL3KEV- -QYAPQLADFGVLSSGT 104 

IS L+ L F +KL P++ N +IS + P+ + +SG+ 

Sbjct: '95 ISNADLWTLAFFKNKLTDSQTPvVLLDNCNIA-DISEAAADPLPTIDPEHPFYMGFTSGS 153 

Query: 105 TADAKLLWRSFTSWSDFFSIQNAYFSvTS^3SKLF"QGDFSFTGNLNIlALSLLLLGGTLW 164 

T K RS SW + F+ FS++S+ K+ I G + L A+S L LGGT+ + 

Sbjct: 154 TGKPKAFTRSHRSWMESFTCTETDFSISSDDKVLIPGALMSSHFLYGAVSTLFLGGTVCL 213 

Query: 165 TQKNSVKYWQTLWEKTGVTHLYLLPSYLKLVEQYSKETALDNKTIITSSQYVSDSLLEGL 224 

+K S + + ++ LY +P+ + + KI tit ++S + L 

Sbjct: 214 LKKFSPAKAKEWLCRESISVIiYTVPTMTDALARIEGFPDSPVKIISSGADWPAES-KKKL 272 



Query: 274 GRIFVKTPYSICG LSSEYCAGDYGELID--GKLYLFGRGGDWCNQSGIKLYLPRL 326 

G+IFVK+P G E+ D +D G LY+ GR G+ ++ + 

Sbjct: 333 GKIFVKSPMRFSGYVNGSTPDEWMTVDDMGYVDEEGFLYISGRENGMIVYGGLNIFPEEI 392 

Query: 327 IEKI KTCPYI KDAVAFTKESQSHGQESHCCI VLIENQMQQECLKWLSEHFEKKYGFKHYH 386 

+ CP ++ A + G+ + V++ N + W + K + 

Sbjct: 393 ERVLLACPEVESAAWGIPDEYWGEIA- -VAVILGNANARTLKAWCKQKLASYKIPKKWV 450 

Query: 387 IVSKIPLMPSGKIDYQQLKRQL 408 

+P SGKI ++K+ L 
Sbjct: 451 FADSLPETSSGKIARSRVKKWL 472 

A related DNA sequence was identified in S.pyogenes <SEQ ID 185> which encodes the amino acid 
sequence <SEQ ID 186>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2487 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 154/413 (37%) , Positives = 235/413 (56%) , Gaps = 9/413 (2%) 

Query: 1 MLESLKTIVKTNSDKKLFDGD-LQVSYGEFYNLVR-QDMASQDNRKHVISTHSLLNQLVR 58 

ML L+ K +KK D + ++Y E + V +D +D+ ++IS LNQL+ 
Sbjct: 1 MLTKLEYWAKQCPNKKAIVADQISLTYQELWQATOIKDQTIKDSVPYIISHSRYLNQLLS 60 

. Query: 59 FVSKLCQKALPIICKPNLT HNEISRLEKEVQYAPQLADFGVLSSGTTADAKLLWRSF 115 

F+ L + + PII PN++ +1 ++ E+ + ADF VLSSGTT AKL WR 
Sbjct: 61 FLRGLKEGSCPIILHPNISGTFQQQIKHVDGELL KKADFAVLSSGTTGKAKLFWRKL 117 
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Query: 116 TSWSDFFSIQNAYFSVTSNSKLFIQGDFSFTGNLNLAL3LLLLC3GTLVVTQKNSVKYWQT 175 

++W+ F QN F +T NS LF+ G FSFTGNLNLAL+ L GG LV++QK S+K W + 
Sbjct: 118 STWTRLFDYQNKVFGMTGNSCLFLHGSFSFTGra^LALRQLmGGCLVLSQKLSLKTWLS 177 

Query: 176 LWEKTGVTHLYLLPSYLKLVEQYSKETALDNKTIITSSQYVSDSLLEGLYRKHPKVSVKI 235 

LW+ V+HLYLLP+YL + Y + + ++TSSQ +S LL Y+K P++ + I 
Sbjct: 178 LWQAKKVSHLYLLPTYLNRLLPYLTKNNMTATHLLTSSQMISQELLRHYYKKFPQLEIVI 237 

Query: 236 FYGASEIJSIYVSWYDGRDIRDKPQYVGEIVPNVAWIKEGRIFVKTPYSICGLSSEYCAGD 295 

FYGASEL++++W +GR VG+ P+V++ K+ IFV+TPYS+ G+S Y D 

Sbjct: 238 FYGASELSFITWCMGRAAVKINGLVGQPFPDVS1SFKDKEIFVETPYSVEGMSQPYSVSD 297 

Query: 296 YGEIaIDGKLYLFGRGGDWCNQSGIKIiYLPRI.IEKIiCTCPYIKDAVAFTKESQSHGQESHC 355 

G++ L L GR DW NQ G+K +LP L+E P +K+A A K + + 

Sbjct: 298 LGKMSPAGLILEGRQDDWVNQRGVKCHLPSLVELAHQAPNVKEAHAL-KIGKGENETLIL 356 

Query: 356 CIVLIENQMQQECLKWLSEHFEKKYGFKHYHIVSKIPLMPSGKIDYQQLKRQL 408 

+VL + +L+ + K+Y ++ +PL +GKI+ + L ++ 

Sbjct: 357 VLVLTKKDCIAPIKDFI^YLNSGQLPKYYLVIDCLPLKDNGKINREVLLNKl 409 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 60 

A DNA sequence (GBSx0059) was identified in S.agalactiae <SEQ ID 187> which encodes the amino acid 
sequence <SEQ ID 188>. This protein is predicted to be endonuclease III (pdg). Analysis of this protein 
sequence reveals the following: 

Possible site: 46 



Final Results 

bacterial membrane 
bacterial outside 
35 bacterial cytoplasm 



--- Certainty=0. 1001 (Affirmative) < succ; 
■-- Certainty=0. 0000 (Not Clear) < suco 
■-- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05417 GB:AP001512 endonuclease III (DNA repair) [Bacillus halodurans] 
Identities = 95/202 (47%) , Positives = 134/202 (66%) 

40 

Query: 1 MLSKAKSRYIIREIIKLFPDAKPSLDFTNVFELLVAVMLSAQTTDAAVNKVTPALFERFP 60 

ML+K +++ + I ++PDA+ L +N FELL+AV+LSAQ TDA VMKVTP LF ++ 
Sbjct: 1 MLTKKQTQFALAVIADmPDAECELTHSNPFELLIAVVLSAQCTDALWKVTPRLFAKYK 60 

45 Query: 61 NPLVTAQADPKEIEPYISKIGIiYRNKARFLNQCAKQLIEHFDGKVPRTRQELESLAGVGR 120 

P +E+E I IGLYRNKA+ + + + L+E + G+VP+ R EL LAGVGR 

Sbjct: 61 TPEDYIAVPLEELEQDIRSIGLYRNKAKNIKKLCQSLLEQYGGEVPQDRDELVKLAGVGR 120 

Query: 121 K?TA]WVMSVGFGIPAFAVDXRVTRICKHHQICKQSASPLEIEKRVMEVLPPEEWLAAHQS 180 
50 KTANW SV FG+PA AVDTHV R+ K IC+ + ++E+ +M+ +P +EW +H 

Sbjct: 121 KTANWASVAFGVPAI AVDTHVERVSKRLG I CRWKDNVTQVEQTLMKKI PMDEWS I SHHR 180 

Query: 181 MI YFGRAI CHPKNPKCDQYPQL 202 
+I+FGR C +NP+CD P L 
55 Sbjct: 181 LIFFGRYHCKAQNPQCDICPLL 202 

A related DNA sequence was identified in S.pyogenes <SEQ ID 189> which encodes the amino acid 
sequence <SEQ ID 190>. Analysis of this protein sequence reveals the following: 

Possible site: 44 
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2 


Sbjct: 


3 




62 


Sbjct: 


63 




122 


Sbjct: 


123 




182 


Sbjct: 


183 



>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Hot Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 91/199 (45%) , Positives = 133/199 (66%) 

LSKAKSRyilREIIKLFPDAKPSLDFTNVFELLVAVMLSAQTTDARVNKVTPALFERFPN 61 
+ KA+ ++ I ++FP+AK LD+ F+LL+AV+LSAQTTD AVNKVTP L++ +P 
IGKARIAKVLTIIGQMFPFAKGELDWETPFQLLIAVIIjSAQTTDKAvNKVTPGLWQSYPE 62 



1- F G+VP+T +ELESL GVGRK 



TANW++ +G+PA AVDTHV R+ K I A +IE +M +P ++W+ 



I+FGR C K PKC+ P 
IFFGRYHCLAKKPKCEICP 201 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 61 

A DNA sequence (GBSx0060) was identified in S.agalactiae <SEQ ID 191> which encodes the amino acid 
sequence <SEQ ID 192>. Analysis of this protein sequence reveals the following: 
Possible site: 51 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certaxnty=0. 2264 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

. [Streptococcus mutans] 

Query: 1 MKVLFDVQNLLKKFGIYVYIGKRLYDIEVMKIELQRLYDNGLISRDDYLKAELILRREHR 60 

MK L+DVQ LLK+FGI+VY+GKRLYDIE+MKIEL+RLYDNGLIS+ DYL AELILRREHR 
Sbjct: 1 MKTLYDVQRLLKQFGIFVYLGKRLYDIEM^KIELERIiYDNGLISKSDYLHAELILRREHR 60 

Query: 61 LELEKEN 67 

+E E+EN 
Sbjct: 61 1EKEREN 67 

A related DNA sequence was identified in S.pyogenes <SEQ ID 193> which encodes the amino acid 
sequence <SEQ ID 194>. Analysis of this protein sequence reveals the following: 
Possible site: 57 



» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 1962 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 53/66 (80%) , Positives = 60/66 (90%) 

Query: 1 MKVLFDVQNLLKKFGIYVYIGKRLYDIEVMKIELQRLYDNGLISRDDYLKAELILRREHR 60 

MK L+DVQ LLK FGI+OT+GKRLYDIE+MKIELQRLYD+GL+ + DYL AELILRREHR 
Sbjct: 7 MKTLYDVQQLLKNFGIFVYLGKRLYDIEMMKIELQRLYDSGLLDKRDYLNAELILRREHR 66 

Query: 61 LELEKE 66 

LELEKE 
Sbjct: 67 LELEKE 72 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 62 

A DNA sequence (GBSx0061) was identified in S.agalactiae <SEQ ID 195> which encodes the amino acid 
sequence <SEQ ID 196>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.06 Transmembrane 133 - 149 ( 133 - 150) 

Final Results 

bacterial membrane Certainty=0. 1022 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05144 GB:AP001512 glucose kinase [Bacillus halodurans] 
Identities = 145/315 (46%), Positives = 209/315 (66%), Gaps = 2/315 (0%) 

LGIDLGGTTIKFGILTLEGEVQEKWAIETNTLENGRHIVSDIWSLKHRLSLYGLTKDDF 65 
+G+D+GGTTIK LT GE+ +KW I TN + G I ++I ++L RLS + +K D 
VGVDVGGTTIKMAFLTTAGEIVDKWEIPTNKQDGGALITTNIADALDKRLSGHHKSKSDL 66 







Sbjct: 


7 




66 


Sbjct: 


67 


Query: 


126 


Sbjct: 


126 


Query: 


186 


Sbjct: 


186 


Query: 


246 


Sbjct: 


245 


Query: 


306 


Sbjct: 


305 



+E+E +P +DNDAN+AALGE W 



■ +TLGTGVGGG+ +A+GN+ +HGV G GEIGH+ V PE G C CG GCL 



ETVASATG+ R+A + +++ S + D +T+KD+F AA+ D FA SW+ 4 



YLG A AN++N LNP+ +VIGGGVS AG+ L ++++F +A P+V 



NDAG+IG L QQ 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 197> which encodes the amino acid 
sequence <SEQ ID 198>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 10S0 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — - Certainty=0.0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 270/319 (84%) , Positives = 292/319 (90%) 

Query: 1 MSKKLLGIDLGGTTIKFGILTLEGEVQEKWAIETNTLENC3RHIVSDIVBSLKHRLSLYGL 60 

MS+KLLGIDLGGTTIKFGILT GEVQEKHAIETO LE G+HIV DI+ S+KHRL LYGL 
Sbjct: 1 MSQKLLGIDLGGTTIKFGILTAAGEVQEKKAIETNILEGGKHIVPDIIASIKHRLDLYGL 60 

Query: 61 TKDDFLGIGMGSPGAVDRTSKTWGAF^TIjNWADTQEVGSVIEKEVGIPFFIDl^ANVAAL 120 
+ DF+GIGMGSPGAVDR + TVTGAFNLNW +TQEVGSV+EKE+GIPF IDNDANVAAL 

Query: 121 GERWVGAGANNPDWFVTLGTGVGGGVIADGNIjlHGVAGAGGEIGHMIVDPENGFTCTCG 180 

GERWVGAG NNPDWF+TLGTGVGGG+IADGNLIHGVAGAGGEIGHMIV+PENGF CTCG 
Sbjct: 121 GERWVGAGENNPDWFMTLGTGVGGGI IADGNLIHGVAGAGGEIGHMIVEPENGFACTCG 180 

Query: 181 NKGCLEWASATGVVRVaRQLAEQYEGSSAIKAAIDNGr^TTOSKDIFIAAEDGDKFANSV 240 

+ GCLETVASATGW+VAR LAE YEG SAIKAAIDNG+ VTSKDIF+AAE GD FA+SV 
Sbjct: 181 SHGCLEWASATGWKVARLLAEAYEGDSAIKAAIDNGEGVTSKDIFMAAEAGDSFADSV 240 

Query: 241 VERVSRYLGIAAANISN1LNPDSWIGGGVSAAGEFLRSRVEICYFVTFAFPQVKKSTKIK 3 00 

VE+V YLGLA+ANI SNI LNPDSWIGGGVSAAGEFLRSR+EKYFVTF FPQV+ STKIK 
Sbjct: 241 VEKVGYYLGLASANISNILNPDSWIGGGVSAAGEFLRSRIEKYFVTFTFPQVRYSTKIK 300 

Query: 301 IAELGNDAGI IGAASLANQ 319 

IAELGNDAGI IGAASLA Q 
Sbjct: 301 IAELGNDAGI IGAASLARQ 319 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 63 

A DNA sequence (GBSx0062) was identified in S.agalactiae <SEQ ID 199> which encodes the amino acid 
sequence <SEQ ID 200>. Analysis of this protein sequence reveals the following: 
Possible site: 19 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside --- Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14385 GB:Z99116 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 51/124 (41%) , Positives = 71/124 (57%) , Gaps = 1/124 (0%) 

Query: 3 MSVILIIVILLAFVAWASWNYWRVRRAAKFLDNESFQKEMSRGQLIDIRFAGAFHRKHIL 62 

MS +++++I AF+ + +Y +R K L E F+ + QLID+RE F HIL 
Sbjct: 1 MSNMIVLIIFPAFIIYMIASYVYQQRIMKTLTEEEFRAGYRKAQLIDVREPNEFEGGHIL 60 
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Query: 63 GARNIPASQFKVALSALRKDKPVLLYDASRGQSIPRIVLLLRKEGFNQLY-^KDGFNYWT 122 

GARNI P SQ K + +R DKPV LY + +S R LRK G ++Y LK GF W 
Sbjct: 61 GARNIPLSQLKQRKNEIRTDKPVYLYCQNSVRS-GRaAQTLRKNGCTEIYNLKGGFKKWG 119 

Query: 123 GRVK 126 
G++K 

Sbjct: 120 GKIK 123 

A related DNA sequence was identified in S.pyogenes <SEQ ID 20 1> which encodes the amino acid 
sequence <SEQ ID 202>. Analysis of this protein sequence reveals the following: 

Possible site: 30 
»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -4.41 Transmembrane 4 - 20 ( 1-22) 



Final Results 

bacterial membrane --- Certainty=0. 2763 (Affirmative) . 
bacterial outside --- Certainty=0. 0000 (Not Clear) < i 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < i 

The protein has homology with the following sequences in the databases: 



Query: 8 LWLLLVGIVGYYTWNYFSFRKMAKQVDNETFKDVMRQGQLIDLREPAAFRTKHILGARNF 67 

+WL+L+ ++ Y + K K + E F R+ QLID+REP + + HILGARN 

Sbjct: 5 VWLVLLALLVYVLFKRLYTPKYLKTLTQEEFIQGYRKAQLIDVREPREYDSGHILGARNI 64 

Query: 68 PAQQFDAAIKGLRKTJKPVLIYENMRPQYRVPAVKKLKKAGFEDVYVLKDGIDYWDGKVKQ 127 

P Q +K +R D+PV +Y + R A KK G EDV LK G W GK+K+ 

Sbjct: 65 PLSQLKQRLKEVRTDQPVYLYCQSGARSRQAAAILKKKHGVEDi/NHLKGGFRKWTGKIKK 124 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 63/126 (50%) , Positives = 85/126 (67%) 
Sbjct: 

Query: 61 ILGARNIPASQFKVALSALRKDKPVLLYDASRGQSIPRIVLLLRKEGFNQLYVLKDGFNY 120 

ILGARN PA QF A+ LRKDKPVL+Y+ R Q V L+K GF +YVLKDG +Y 

Sbjct: 61 ILGARNFPAQQFDAAIKGLRKDKPVLIYEflMRPQYRvPAVKKLKKAGFEDvYVLKriGIDY 120 

Query: 121 WTGRVK 126 

W G+VK 
Sbjct: 121 WDGKVK 126 

A related GBS gene <SEQ ID 8483> and protein <SEQ ID 8484> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 1 
McG: Discrim Score: 17.55 
GvH: Signal Score (-7.5): 3.36 

Possible site: 17 
»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 8.86 threshold: 0.0 
PERIPHERAL Likelihood =8.86 99 
modified ALOM score: -2.27 

*** Reasoning Step: 3 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 
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bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

40.4/56.5% over 122aa 
5 Bacillus subtilis 

EGADj45852| hypothetical 14.6 kd protein in gcvt-spoiiiaa intergenic region Insert 
characterized 

SP|P54510|YQHL_BACSU HYPOTHETICAL 14.6 KDA PROTEIN IN GCVT-SPOIIIAA INTERGENIC REGION. 
Insert characterized 
10 GP|l303893 |dbj |BAA12549.l| |D84432 YqhL Insert characterized 

GP| 2634888 |emb|CAB14385.l| |Z99116 similar to hypothetical proteins Insert characterized 
PIR|C69959|C69959 glpE protein homolog yqhL - Insert characterized 

ORF00659(307 - 678 of 978) 
15 EGAD|45852]BS2449 (1 - 123 of 126) hypothetical 14.6 kd protein in gcvt-spoiiiaa intergenic 

region {Bacillus subtilis}SP| P54510 | YQHL_ 

BACSU HYPOTHETICAL 14.6 KDA PROTEIN IN GCVT-SPOIIIAA INTERGENIC 

REGION. GP| 1303893 | dbj |BAA12549.1 | |D84432 YqhL {Bacillus subtilis}GP| 

2634888|emb|CAB1438S.l| |Z99116 similar to hypothetical proteins {Bacillus 
20 subtilis}PIR|C69959|C69959 glpE protein homolog yqhL - Bac 

illus subtilis 
%Match =13.3 

%Identity =40.3 %Similarity =56.5 

Matches =50 Mismatches = 53 Conservative Sub.s = 20 

25 

108 138 168 198 228 258 288 318 

NISNIMPDSWIGV«CLSSR*IFT*SR*EILCHICFPTS*KOTI*N*DC*TR**CWYYWCSKLSQSTSKLRR*GMDMSVI 

II » 
MSNM 

30 

348 378 408 438 468 498 528 558 

LIIVILIAFVAWAS*mT\KVRRA,WLDNESFQKE^ 

::: = |: ||: : =1 s| I I I |: : Nihil I lllllllll IN : =1 Mil 
IVLIIFPAFIIYMIASYVYQQRIMKTLTEEEFRAGYRKAQLIDVREPNEFEGGHILGARNIPLSQLKQRKNEIRTDKPVY 
35 20 30 40 50 60 70 80 

588 618 648 678 708 738 768 798 

LYDASRGQSI PRIVLLLRKEGFNQLYVLKDGFNYWTGRVK'YTKERVTINNSLHFL* K* I KLKKVENKWHK* *NDEKFSY 
II I III I : = l II II I h = l 

40 LY-CQNSVRSGRAAQTLRKNGCTEIYNLKGGFKKWGGKIKAKK 
100 110 120 

SEQ ID 8484 (GBS13) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 3 (lane 4; MW 16kDa). It was also expressed in E.coli as a GST-fusion product. 
45 SDS-PAGE analysis of total cell extract is shown in Figure 9 (lane 2; MW 40.5kDa). 

The GST-fusion protein was purified as shown in Figure 190, lane 5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 64 

50 A DNA sequence (GBSx0063) was identified in S.agalactiae <SEQ ID 203> which encodes the amino acid 
sequence <SEQ ID 204>. This protein is predicted to be regulatory protein TypA (typA). Analysis of this 
protein sequence reveals the following: 

Possible site: 36 

55 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1738 (Affirmative) < suco 
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bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

5 >GP:CAB13350 GB:Z99111 similar to GTP-binding elongation factor 

[Bacillus subtilis] 
Identities = 455/609 (74%) , Positives = 534/609 (86%) , Gaps = 2/609 (0%) 

Query: 4 LRTDIRNVAII7AHVDHGKTTLVDELLKQSHTLDERKELEER?\MDSNDIEKERGITILAKN 63 
10 LR D+RN+AI IAHVDHGKTTLVD+LIi Q+ T +++ ERAMDSND+E+ERGITILAKN 

Sbjct: 3 LRM}LRNIAIIAHVDHGKTTLVDQLLHQAGTFRANEQVAERAMDSNDLERERGITIL7AKN 62 

Query: 64 TAVAYNDTOINIMDTPGHADFGGEWRIMKMVDGWLVVDAYEGTMPQTRFVLKKALEQN 123 
TA+ Y D RINI+DTPGHADFGGEVERIMKMVDGWLWDAyEG MPQTRFVLKKALEQN 
15 Sbjct: 63 TAINYKI)TRINILDTPGHADFGGEVERIMK^1VDGWLVVDAYEGCMPQTRFVLKKALEQN 122 



Query: 124 LIPIVWNKIDKPSARPSEWDEVLELFIELGADDDQLDFPWYASAINGTSSMSDDPSD 183 

L P+VWNKID+ ARP EV+DEVL+LFIEL A+++QL4FPWYASAINGT+S+ DP 
Sbjct: 123 IjNPWWNKIDRDFARPEEVIDEVLDLFIELDANEEQLEFPVVYASAINGTASL--DPKQ 180 

Query: 184 QEKTMAPIFDTIIDHIPAPVDNSEEPLQFQVSIiIjDYNDFVGRIGIGRVFRGTVKVGDQVT 243 

Q++ M +++TII H+PAPVDN+EEPLQFQV+LLDYND+VGRIGIGRVFRGT+KVG QV+ 
Sbjct: 181 QDENMEALYETIIKHVPAPVDNAEEPLQFQVAIiLDYlCDYVGRIGIGRVFRGTMKVGQQVS 240 

Query: 244 LSKLDGTTKNFRVTKLFGFFGLERKEIQEAKAGDIjIAVSGMEDIFVGETVTPTDAIEPLP 303 

L KLDGT K+FRVTK+FGF GL+R EI+EAKAGDL+AVSGMEDI VGETV P D +PLP 
Sbjct: 241 LMKLDGTAKSFRVTKIFGFQGLKRVEIEEAKAGDLVAVSGMEDINVGETVCPVDHQDPLP 300 

Query: 304 vLRIDEPTLQMTFLVNNSPFAGREGI<WITSRIOTEERLLAELQTDVSLRVDPTDSPDKWTV 363 

VLRIDEPTLQMTF+VNNSPFAGREGK++T+RK+EERL ++LQTDVSLRV+PT SPD W V 
Sbjct: 301 VLRIDEPTLQMTFVVNNSPFAGREGKYVTARKIEERLQSQLQTDVSLRVEPTASPDAWW 360 



Query: 364 SGRGELHLSILIETMRREGyELQVSRPEVIIKEIDGVQCEPFERVQIDTPEEYQGAIIQS 423 

SGRGELHLS ILIE MRREGYELQVS+PEVIIKEIDGV+CEP ERVQID PEE+ G++++S 
Sbjct: 361 SGRGELHLSILIENMRREGYELQVSKPEVIIKEIDGVRCEPVERVQIDVPEEHTGSVMES 420 

Query: 424 LSERKGDMLDMQMVGNGQTRLIFLIPARGLIGYSTEFLSMTRGYGIMNHTFDQYLPWQG 483 

+ RKG+M+DM GNGQ RLIF +P+RGLIGYSTEFLS+TRG+GI+NHTFD Y P+ G 
Sbjct: 421 MGARKGEMVDMINNGNGQVRLIFTVPSRGLIGYSTEFLSLXRGFGILNHTFDSYQPMQAG 480 

Query: 484 EIGGRHRGALVSIENGKATTYSIMRIEERGTIFVNPGIEVYEGMIVGENSRDNDLGVNIT 543 

++GGR +G LVS+ENGKAT+Y I IE+RG IFV PG EVYEGMIVGE++RDNDL VN++ 
Sbjct: 481 OVGGRRQGVLVSMENGKATSYGIQGIEDRGVIFVEPGTEVYEGMIVGEHNRDNDLVVNVS 540 

Query: 544 TAKQMTNVRSATKDQTAVIKTPRILTLEESLEFLADDEYI'IEVTPESIRLRKQILNKAARD 603 

KQ TNVRSATKDQT IK RI++LEESLE+L +DEY EVTPESIRLRK+IIiNK R+ 
Sbjct: 541 KMKQQTNVRSATKDQTTTIKKARIMSLEESLEY^NEDEYCEVTPESIRLRKKILNKNERE 600 

Query: 604 KANKKKKSA 612 

KA KKKK+A 
Sbjct: 601 KAAKKKKTA 609 

A related DNA sequence was identified in S.pyogenes <SEQ ID 205> which encodes the amino acid 
sequence <SEQ ID 206>. Analysis of this protein sequence reveals the following: 

Possible site: 36 



>» Seems to have no N-terminal signal sequence 

Final Results 

60 bacterial cytoplasm Certainty=0. 1738 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 
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Identities 
Query: 
Sbjct 

Sbjct: 
Query: 
Sbjct 



Sbjct 
Query: 
Sbjct 

Sbjct 

Sbjc 

Sbjct: 

Sbjct 
Query: 
Sbjct 



: 594/613 (96%) , Positives = 607/613 (98%) 

1 MTNLRTDIRWAIIAHVDHGKTTLVDELLKQSHTLDERKELEERAMDSNDIEKER6ITIL 60 

MTNLR DIRIWAIIAHVDHGKTTLVDELLKQSETLDERKEL+ERAMDSHD+EKERGITIL 
1 MTmRiroiRWAIIAHVDHGKrTLVDELDKQSHTLDERKELQERAMDSNDDEKERGITIL 60 

61 AKOTAVAYITOTOINIMDTPGHADFGGEVERIMKM^ 120 

AKOTAVAYNDWINIMDTPGHADFGGEVERIMKMVDGVVLVVDAYEGTMPQTRPVLKKAL 
61 AKOTAVAYNDVRINIlflOTGHM^ 120 



181 PSDQEKTMAPIFDT1IDHIPAPVDNSEEPLQFQVSLLDYNDFVGRIGIGRVFRGTVKVGD 240 

P+DQE TMAPIFDTIIDHIPAPVDNS+EPLQFQVSLLDYHDFVGRIGIGRVFRGTVKVGD 
181 PADQEHTMAPIFDTIIDHIPAPVDNSDEPLQFQVSLLDYNDFVGRIGIGRVFRGTVKVGD 240 

241 QVTLSKLDGTTKNFRVTKLFGFFGLERKEIQEAKAGD^IAVSGMEDIFVGETVTPTDAIE 300 

QVTLSKLDGTTKNFRVTKX.FGFFGLER+EIQEAKAGDLIAVSGMEDIFVGET+TPTD +E 
241 QVTLSKLDGTTKNFRVTKLFGFFGLERREIQEAKAGDLIAVSGMEDIFVGETITPTDCVE 300 

301 PLPVLRIDEPTLQMTFLvNNSPFAGREGKWITSRKVEERLIAELQTDVSLRVDPTDSPDK 360 

LP+LRIDEPTLQMTFLVNNSPFAGREGKWITSRKVEERLLAELQTDVSLRVDPTDSPDK 
301 ALPILRIDEPTLQMTFLVNNSPFAGREGKWITSRKVEERLLAELQTDVSLRVDPTDSPDK 360 



421 IQSLSERKGDMLDMQMVGNGQTRLIFLIPARGLIGYSTEFLSMTRGYGIMNHTFDQYLPV 480 

IQSLSERKGDMLDMQMVGNGQTRIilFLIPARGLIGYSTEFLSMTRGYGIMNHTFDQYLPV 
421 IQSLSERKGDMLDMQMVGNGQTRLIFLIPARGLIGYSTEFLSMTRGYGIMNHTFDQYLPV 480 

481 VQGEIGGRHRGALVSIENGKATTYSIMRIEERGTIFVNPGIEVYEGMIVGENSRDNDLGV 540 

VQGEIGGRHRGALVSIENGKATTYSIMRIEERGTIFVNPG EvYEGMIVGENSRDNDLGV 
481 VQGEIGGRHRGALVSIENGKATTYSIMRIEERGTIF^/NPGTEVYEGMIVGENSRDNDLGV 540 

541 NITTAKQMTNVRSATKDQTAVIKTPRILTLEESLEFLADDEYMEVTPESIRLRKQILNKA 600 

NITTAKQNTOTRSATKDQTAVIKTPRILTLEESLEFL DDEYMEVTPESIRLRKQILNXA 
541 NITTAKQMTNVRSATKDQTAVIKTPRILTLEESLEFLNDDEYMEVTPESIRIiRKQILNKA SOO 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



Example 65 

A DNA sequence (GBSx0065) was identified in S.agalactiae <SEQ ID 207> which encodes the amino acid 
sequence <SEQ ID 208>. This protein is predicted to be D-glutarnic acid adding enzyme MurD (murD). 
Analysis of this protein sequence reveals the following: 



Possible site: 29 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) • 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < 
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A related GBS nucleic acid sequence <SEQ ID 9615> which encodes amino acid sequence <SEQ ID 9616> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



MKTITTFENKKVLVLGLARSGEAAJ^LIAKL^I\^miC3KPFDENPTAQSLLEEGIKOT 64 
MK I F+NKKVLVLGLA+SGE+AARLL KLGAIVTVNDGKPF++NP AQ LLEEGIKV+ 
MKVIDQFKHKKVLVLGIAKSGESAARLLDKLGAIVTVNDGKPFEDNPAAQCLLEEGIKVI 60 



G HPLELLDE+F M+KNPGIPY+NPM+---KAL K IPVLTEVELAYL+SE+ +IGITGS 



NGKTTTTTMI EVL A GQ GLL+GNIG+PAS+V Q A DK+TLVMELSSFQLMGV+EF 



P IAVITNLMPTH+DYHG FE+YVAAKWNIQN+M+++DFLVLNFNQ + K+LA T+AT+ 



VPFST EKVDGAY++D QL+++GE +M+ + + IGVFGSHNVENAIATIAVAKL G+ NQ I 



+ETLS FGGVKHRLQ + + G+ FYNDSKSTNIIATQKALSGFDN+KV+LIAGGLDRGN 



EFDEL+PDITGLK MV+LG+SA RVKRAA KAGV Y +A D+ DA KAYE+A QGDV+L 



LSPANASWDMY NFEVRGD FIDT L+ 
LSPANASWDMYANFEWGDLFIDTVAELK 449 

A related DNA sequence was identified in S.pyogenes <SEQ ID 209> which encodes the amino acid 
sequence <SEQ ID 210>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
>» Seems to have a cleavable Kf-term signal seq. 

Final Results 

bacterial outside --- Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

RGD motif: 436-438 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 329/451 (72%) , Positives = 397/451 (87%) 

Query: 5 MKTITTFENKKVLVLGLARSGEAAARLIAKLGAIVTVNDGKPFDENPTAQSLLEEGIKW 64 

MK 1+ F+NKK+L+LGLA+SGEAAA+LL KLGA+VTVND KPFD+NP AQ+LLEEGIKV+ 
Sbjct: 1 MKVISNFQNKKILILGIAKSGFAA&KLLTKLGAL^^ 60 

Query: 65 CGSHPLELLDEDFCYMI KNPGI PYNNPMVKKALEKQIPVLTEVELAYLVSESQLIGITGS 124 

CGSHP+ELIiDE+F YM+KNPGI PY+NPMVK+AL K+IP+LTEVELAY VSE+ +IGITGS 
Sbjct: 61 CGSHPVELLDFJ^FEYMVKNPGIPYDNPMWEALAKEIPILTEVELAYFVSEAPIIGITGS 120 

Query: 125 NGKTTTTTMIAEVLNAGGQRGLLAGNIGFPASEWQAANDKDTLVMELSSFQLMGVKEFR 184 



Query: 


5 


Sbjct: 


1 


Query: 


65 


Sbjct: 


61 


Query: 


125 


Sbjct: 


121 


Query: 


185 


Sbjct: 


181 




245 


Sbjct: 


241 




305 


Sbjct: 


301 


Query: 


365 


Sbjct: 


361 


Query: 


425 


Sbjct: 


421 
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NGKTTTTTMIA+VLNAGGQ LL+GNIG+PAS+WQ A DTLVMELSSFQL+GV FR 
Sbjct: 121 NGJCTTTTTMIADVLNAGGQSALLSGNIGYPASKWQKAIAGDTLVMELSSFQLVGWAFR 180 

Query: 185 PHIAVITNLMPTHLDYHGSFEDYVAAKMJIQNQMSSSDFLVL^ 244 
5 PHIAVITNLMPTHLDYHGSFEDYVAAKW IQ QM+ SD+L+LN NQ IS LAKTTKAT+ 

Sbjct: 181 PHIAVITNLMPTHLDYHGS FEDYVAAKWM I QAQMTE SDYLILNANQE I SATLAKTTKATV 240 

Query: 245 VPFSTTEKVDGAWQDKQLFYKGENIMSVDDIGVPGSHNVENAIATIAVAKLAGISNQVI 3 04 
+PFST + VDGAY++D L++K + I++ D+GVPGSHN+ENALATIAVAKL+GI++ +1 
10 Sbjct: 241 IPFSTQKVVDGAYLKDGILyFKEQAIIAATDLGVPGSHNIENALATIAVAKLSGIADDII 300 

Query: 305 RETLSNFGGVKHRLQSLGKVHGISFYNDSKSTNIIATQKALSGFDNTK^ILIAGGLDRGN 364 

+ LS+FGGVKHRLQ +G++ I+FYNDSKSTNILATQKALSGFDN+++ILIAGGLDRGN 
Sbjct: 301 AQCLSHFGGVKHRLQRVGQIKDITFYNDSKSTNILATQKALSGFDNSRLILIAGGLDRGN 360 

15 

Query: 365 EFDELIPDITGLKHM\'VLGESASRVKRAAQKAGVTYSDALDVRDAVHKAYEVAQQGDVIL 424 

EFD+L+PD+ GLK M++LGESA R+KRAA KA V+Y +A +V +A A+++AQ GD IL 
Sbjct: 361 EFDDLVPDLLGLKQMIILGESAERMKRAANKAaVSYLFARWA^TELAFKIAQTGDTIL 420 

20 Query: 425 LSPANASWDMYKNFEVRGDEFIDTFESLRGE 455 

LSPANASWDMY NFEVRGDEF+ TF+ LRG+ 
Sbjct: 421 LS PANASWDMYPNFE VRGDE FLAT FD CLRGD 451 

SEQ ID 208 (GBS305) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
25 extract is shown in Figure 51 (lane 11; MW 53.7kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 56 (lane 3; MW 79kDa). 

The GBS305-GST fusion product was purified (Figure 207, lane 8) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 270), which confirmed that the protein is immurioaccessible 
on GBS bacteria. 

30 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 66 

A DNA sequence (GBSx0066) was identified in S.agalactiae <SEQ ID 21 1> which encodes the amino acid 
sequence <SEQ ID 212>. Analysis of this protein sequence reveals the following: 

35 RGD motif 285-287 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 
40 INTEGRAL Likelihood = -1.65 Transmembrane 74 - 90 ( 73 - 93) 

Final Results 

bacterial membrane Certainty=0. 1659 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 213> which encodes the amino acid 
sequence <SEQ ID 214>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

50 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.33 Transmembrane 81 - 97 ( 80 - 100) 
INTEGRAL Likelihood = -0.16 Transmembrane 272 - 288 ( 271 - 288) 

55 Final Results 

bacterial membrane Certainty=0. 1532 (Affirmative) < suco 
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A related sequence was also identified in GAS <SEQ ID 9141> which encodes the amino acid sequence 
<SEQ ID 9142>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.33 Transmembrane 74 - 90 
INTEGRAL Likelihood = -0.1S Transmembrane 265 - 281 

Final Results 

bacterial membrane --- Certainty=0. 1532 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

RGD motif: 286-288 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 249/358 (69%), Positives = 293/358 (81%), Gaps = 1/358 (0%) 

MGKKIVFTGGGTVGHVTLNLILIPKFIKDGWEVHYIGDKNGIEHEQINQSGLDITFHSIA 60 
M KKI + FTGGGTVGHVTLNLI LI PKF I KDGWEVHY1GDKNGIEH +1 +SGLD+TFH+IA 
MPKKILFTGGGTVGHVTLNLILIPKFIKDGWEVHYIGDKNGIEHTEIEKSGLDVTFHAIA 67 

TGKLRRYFSWQNMLDVFKVGVGVLQS IAI IAKLRPQALFSKGGFVSVPPWAARLLKVPV 120 
TGKLRRYFSWQN+ DVFKV +G+LQS+ I+AKLRPQALFSKGGFVSVPPWAA+LL PV 
TGKIJ»YFSWQNIJU3VFKA/a]^I^SLFIVAKLRP^ 127 

FVHESDLSMGIMKIAYKFATII^TTFEQSKDLIKTKHIGAVTKVM-DCKKSFENTDLTS 179 
F+HESD SMQLANKIAYKFAT MYTTFEQ L K KH+GAVTKV D + E+T L + 
FIHESDRSMGLANKIAYKFATTWTTFEQEDQLSKTOILGAVTICVFKDANQMPESTQLEA 187 

IKEAFDPNLKTLLFIGGSAGAKVETOFITQTPELEEKYNVINISGDSSLNRLKKNLYRVD 239 
+KE F +LKTLLFIGGSAGA VFN FI+ PEL+++YN+INI+GD LN L +LYRVD 
VKEYFSRDLKTLLFIGGSAGAHVFNQFISDHPELKQRYNIINITGDPHLNELSSHLYRVD 24 7 

YVTDLYQPLM^nijADWVTRGGSOTIFELVAMKICLjHLIIPLGREASRGDQLENAAYFEEKG 299 
YVTDLYQPLM +AD+WTRGGSNT+FEL+AM KLHLI+PLG+EASRGDQLENA YFE++G 
YVTDLYQPLMAMADLVVTRGGSNTLFELLAMAKLHLIVPLGKEASRGDQLENATYFEKRG 307 

YALQLPESELNINTLEKQINLLISNSESYEKKMSBSSEIKSQDEFYQLLIDDMAKVTK 357 
YA QL E +L ++ ++ + L + YE M + EI+S D FY LL D++ K 
YAKQLQEPDLTLHNFDQAMADLFEHQADYEATMLATKEIQSPDFFYDLLRADISSAIK 365 

SEQ ID 212 (GBS306) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 51 (lane 12; MW 43kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 56 (lane 4; MW 68kDa). 

GBS306-GST was purified as shown in Figure 207, lane 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 67 

A DNA sequence (GBSx0067) was identified in S.agalactiae <SEQ ID 215> which encodes the amino acid 
sequence <SEQ ID 216>. This protein is predicted to be cell division protein DivIB. Analysis of this protein 
sequence reveals the following: 

Possible site: 58 





1 


Sbj Ct: 


8 




61 


Sbjct: 


68 




121 


Sbjct: 


128 


Query: 


180 


Sbjct: 


188 


Query: 


240 


Sbjct: 


248 




300 


Sbjct: 


3Q8 
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• Final Results - 

bacterial it 
bacterial outside ■ 
bacterial cytoplasm • 



-- Certainty=0. 6731 (Affirmative) . 
-- Certainty=0. 0000 {Not Clear) < i 
— Certainty=0. 0000 (Not Clear) < ; 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC95451 GB:AF068902 cell division protein DivIB [Streptococcus pneumoniae] 
Identities = 119/396 (30%), Positives = 214/396 (53%), Gaps = 38/396 (9%) 



Sbjct: 



Query: 
Sbj c 



54 ISSPEEPQNTTKIKKLHFPKIS-- 



— RPKIEKKQKKEKIVNSLAKTNR 97 

S + +++ K+ K++ P+ ++K++++K ++ A + 

65 QESETDQEDSESAKEESEEKA/ASSEADKEKEEKEEPESKEKEEQDKKLSKKATKEKPAKA 124 

98 IRTAPIFV\%FLVILVSVFLLTPFSKQKTITVSGNQHTPDDILIEKTNIQKND 150 

+R I + L+++VS +LL+P++ KIVG T D + + + IQ+D 
125 KIPGIHILRAFTILFPSLLLLIVSAYLLSPYATMKDIRVEGTVQTTADDIRQASGIQDSD 184 

151 YFFSLIFKHKAIEQRIAAEDVWVKTAQMTYQFPNKFHIQVQENKIIAYAHTKQGYQPVLE 210 

Y +L+ E+++ + + WV++AQ+ YQFP KF I+V+E I+AY + + + P+L 

185 YTINLLLDKAKYEKQIKS-NYWVESAQLVYQFPTKFTIKVKEYDIVAYYISGENHYPILS 243 

211 TGK-KADPVNSSELPKHFLTINLDKEDSIKLLIKDLKALDPDLISEIQVISLADSKTTPD 269 

+G+ + V+ + LP+ +L++ + + IK+ + +L + P+L + IQ + LA SK T D 
244 SGQLETSSVSIMSLPETYLSVLFITOSEQIKVFVSEIjAQISPELKAAIQICVELAPSKVTSD 3 03 

270 LLLLDMHDGNSIRIPLSKFKERLPFYKQIKKNLKEPSIVDMEVGWTTTNT1ESTPVKAE 329 
L+ L M+D + + +PLS+ ++LP+Y +IK L EPS+VDME G+Y+ T + E 

PLSEMSKKLPYYSKIKPQLSEPSVVDMEAGIYSYTVADKLIMEVEE 3 53 



330 DTKNKSTDKTQTQNGQVAENSQGQTNNSNTNQCGQQ 3 65 

K ++ + + Q E + Q SN NQ Q+ 

364 KAKQEAKEAEKKQE EEQKKQEEESNRNQTTQR 395 



A related DNA sequence was identified in S.pyogenes <SEQ ID 217> which encodes the i 
sequence <SEQ ID 21 8>. Analysis of this protein sequence reveals the following: 



Possible site: 59 
» Seems to have no N-terminal signal 
INTEGRAL Likelihood = -9.45 



106 - 122 ( 102 



125) 



- Final Results 

bacterial membrane Certainty=0. 4779 (Affirmative) < succ: 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 152/381 (39%) , Positives = 232/381 (59%) , Gaps = 14/381 (3%) 
Sbjct: 

Query: 61 - -QNTTKIKKLHFPKISRPKIEKK- -QKKEKIVNSLAKTNRIRTAPIFWAFLVILVSVF 116 

T +++ S+PK KK Q KEK +A ++ P+ + A L++ VS+F 

Sbjct: 63 KTDE3KTDSQEIESETTSKPKKTKKVRQPKEKSATQIAFQ KSLPVLLGALLLMAVSIF 119 



Query: 117 IjLTPFSKQKTIWSGNQHTPDDILIEKTNIQKNDYFFSLIFKHKAIEQRI^AAEDvWVKTA 176 
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++TP+SK+K +V GN T D LI+ + ++ +DY+ +L+ 



Query: 177 QMTYQFPNKFHIQVQENKI IAYAII1 KC 3YC L 'TET:KKADPVNSSEliPKHFLTINLDKED 236 

+4YQFPN F V E +IIAYA + G+QP+LE GK+ D V +SELPK FL +NL E 
Sbjct: 180 HLSYQFPNHFLFNVIEFEIIAYAQTONGFQPILENGKRVDKVRASELPKSFLILNLKDEK 239 

Query: 237 SIICLLIKDLKALDPDLISEIQVISLADSKTTPDLLLLDMHDGNSIRIPLSKFKERLPFYK 296 

+1+ L+K L L L+ 1+ +SLA+SKTT DLLL++MHDGN +R+P S+ +LP+Y+ 
Sbjct: 240 AIQQLVKQLTTLPKKLVKNIKSVSLANSKTTADLLLIEMHDGNVVRVPQSQLTLKLPYYQ 299 

Query: 297 QIKKNLKEPSIVDMEVGVYTTTNTIESTPVKAEDTKNKSTDKTQTQNGQVAENSQGQTNN 356 

++KKNL+ SIVDMEVG+YTTT IE+ P + + DK 4 G+ Q QT+N 

Sbjct: 300 KLICKNLENDSIVDMEVGIYTTTQEIENQPEVPLTPEQNAADKEGDKPGE HQEQTDN 355 

Query: 357 SNTNQQGQQIATEQAPNPQNV 377 

+ Q + P+P+ V 

Sbjct: 356 DSETPANQSSPQQTPPSPETV 376 

SEQ ID 216 (GBS85) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 17 (lane 10; MW 45.2kDa). 

The GBS85-His fusion product was purified (Figure 105A; see also Figure 193, lane 5) and used to 
immunise mice (lane 1 product; 20ug/mouse). The resulting antiserum was used for Western blot (Figure 
105B), FACS (Figure 105C), and in the in vivo passive protection assay (Table III). These tests confirm 
that the protein is immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 68 

A DNA sequence (GBSx0068) was identified in S.agalactiae <SEQ ID 219> which encodes the amino acid 
sequence <SEQ ID 220>. This protein is predicted to be cell division protein FtsA (ftsA). Analysis of this 
protein sequence reveals the following: 

Possible site: 56 



Final Results 

bacterial membrane Certainty=0. 2275 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC95439 GB:AF068901 cell division protein FtsA [Streptococcus pneumoniae] 
Identities = 292/457 (63%) , Positives = 366/457 (79%) , Gaps = 1/457 (0%) 

Query: 

Sbjct: 

Query: 61 VKQAEEKAGITIDKINVGLPAl'ILLQIEPTQGMIPVPI'IESKEIKDEDVESVVKSALTKSIT 120 

+ QAEEKAGI+I +NVGLP NLLQ+EPTQGMIPV +++KEI D+DVE+ WKSALTKS +T 
Sbjct: 61 ISQAEEKAGISIKSVTTOGLPGNLLQVEPTQGMIPVTSDTKEITDQDvENVVKSALTKSMT 120 

Query: 121 PEREVISLIPLEFIvTJGFQGIRDPRGNMSIRLEMRGLIYTGPTTILHNLRKTVERAGIKV 180 

P+REVI+ IP EFIVDGFQGIRDPRGMMG+RLEMRGL+YTGP TILHNLRKTVERAG++V 
Sbjct: 121 PDREVITFIPEEFIvDGFCGIRDPRG^IMGWLEI■IRGI J LYTGPRTILHNLRIC^VERAGVQV 180 
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Query: 


181 


Sbj Ct: 


181 


Query: 


241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


300 


Query: 


361 


Sbjct: 


360 


Query: 


421 


Sbjct: 


420 



E+V+I+PLA+ +SVLNEGEREFGATV:DKG GQTTVA++RNQELQ+T+I EG DYVTKD 



ISKVLRTTVEIJiEALKFNFGQANVEEASTSDTVQVIIVVGNEEPVEITESYLSQI ISGRIR 300 
ISKVL+T+ ++AE LK N+G+A AS +T QV V+G E VE+TE+YLS+IIS RI+ 
ISKmKTSRKLAEGLKLNYGEAYPPLAS-KETFQVEVIGEVEAVETOEAYLSEIISARIK 299 



ILE +KQ+L R RLLDLPGGI+L+GG AI+PG+VE+AQ++FG RVKL+VPNQVGIRNP 



FA+VIS+ ++ G ++EV+++AQ A+ G+ L H+P+ F 



- +R RG+ GSMFD 



A related DNA sequence was identified in S. pyogenes <SEQ ID 22 1> which encodes the amino acid 
sequence <SEQ ID 222>. Analysis of this protein sequence reveals the following: 

Possible site: 55 



313 - 329 ( 312 - 329) 

■ Final Results 

bacterial membrane Certainty=0 .2338 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AAC95439 GB:AF068901 cell division protein FtsA [Streptococcus pneumoniae] 
Identities = 299/448 (66%) , Positives = 368/448 (81%) , Gaps = 4/448 (0%) 



Query: 


1 


Sbj ct : 


10 


Query: 


61 


Sbjct: 


70 




121 


Sbjct: 


130 




181 


Sbjct: 


190 




241 


Sbjct: 


250 


Query: 


301 


Sbjct: 


309 


Query: 


361 


Sbjct: 


369 



PEEFIVDGFQG1RDPRGMMG+RLEMRGL+YTGP T I LHNLRKTVERAG+ +VEN4- 1 1 SPLA 



M ++4-LNEGEREFGATVIDMG GQTTVA4-+R QELQ+T+I EGG+Y+TKDISKVLKTS 



+AE LK N+G+A 



ET +V+V+G E VEVTE YLSEIISARI+HIL+++KQ+ 



L+R RLLDLPGGIVLIGG AI+PG+VE+AQE+FGV VKL+VPNQVGIRNP F++VISL E 



f G ++EV++LAQ A+ GE L +PI F G 
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Query: 418 TAYDSQVPSDPKQKISERVRGIFGSMFD 445 

D Q S K K+++R RG+ GSMFD 
Sbjct: 429 PMADFQQASQNKPKLADRFRGLIGSMFD 456 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 349/456 (76%) , Positives = 402/456 (87%) , Gaps = 19/456 (4%) 

Query: 10 LD1GTSSIKVLVAEFIANEMNVIGVSNVPSSGVKDGIIIDIEAAATAIKEAVKQAEEKAG 69 

LDIGTSSIKVLVAEFI+ EMNVIGVSNVPS+GVKDGIIIDIEAAATAIK AV+QAEEKAG 
Sbjct: 1 LDIGTSSIKVLVAEFISGEMNVIGVSNVPSTGVKDGIIIDIEAAATAIKTAVEQAEEKAG 60 

Query: 70 ITIDKINVGLPANLLQIEPTQGM1PVPNESKEIKDEDVESWKSALTKSITPEREVISLI 129 

+TI+K+NVGLPANI,LQIEPTQGMIPVP+ESKEIKDEDV-)-SVVKSALTKSITPEREVISL+ 
Sbjct: 61 MTIEKVHVGLPANLLQIEPTQGMIPVPSESKEIKDEDVDSWKSALTKSITPEREVISLV 120 

Query: 130 PLEFITOGFQGIRDPRGNMGlRLEMRGLIYTGPTTILHNLRICTWRkGIIOTEHVVIAPIA 189 

P EFIVDGFQGIRDPRGMMGIRLEMRGLIYTGP+TILHNLRKTVERAGIKVE+++I+PLA 
Sbjct: 121 PEEFIVDGFQGIRDPRGMMGIRLE^GLIYTGPSTILHNLRKTVERAGIKUENIIISPLA 180 

Query: 190 IAKSVLlTOGEREFGATOIDMGGGQTrVASMRNQELQYTNIYSEGSDYVTKDISKVLRTTV 249 

+AK++LNEGEREFGATVIDMGGGQTTVASMR QELQYTNIY+EG +Y+TKDISKVL+T++ 
Sbjct: 181 mKTII^GEREFGATVIDMGGGQTTVASMRAQELQYTNIYAEGGEYITKDISKVLKTSL 240 

Query: 250 EIAFJttKFNFGQAmffiEASTSDTVQVNWGjMEEPVEITESYriSQIISGRIRQILEHVKQD 309 

IAEALKFNFGQA + EAS ++TV+V+WG+EEPVE+TE YLS+IIS RIR IL+ VKQD 
Sbjct: 241 AIAEALKFNFGQAEISEASITETVKVDWGSEEPVEVTERYLSEI1SARIRHILDRVKQD 3 00 

Query: 310 LGRGRLLDrjPGGI ILVGGGAIMPGWEVAQQI FGTRVKLHVPNQVGIRNPMFAOTI SI VD 369 

L RGRLLDLPGGI+L+GGGAIMPGWE+AQ+IFG VKLHVPNQVGIRNPMF+NVI S+V+ 
Sbjct: 301 LERGRLLDLPGGIVLIGGGAIMPGVVEIAQEIFGVTVKLHVPNQVGIRNPMFSNVISLVE 360 

Query: 370 YVGMMSEVDIIAQHAVTGDEMLRHKPVDF DYKEKTNTMSTMPYSEPLTSSME 421 

YVGMMSEVD++AQ AV+G+E+LR KP+DF DY + ST+ Y + + + 

Sbjct: 361 YVGMMSEVDVLAQTAVSGEELLRRKPIDFSGQESYLPDYDDSRRPESTIGYEQQASQTAY 420 

Query: 422 DSNLEPIRARENAQEPTEPKANIGERIRGIFGSMFD 457 

DS Q P++PK I ER+RGIFGSMFD 

Sbjct: 421 DS QVPSDPKQKI SERVRGI FGSMFD 445 

SEQ ID 220 (GBS73) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 17 (lane 5; MW 47.8kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 20 (lane 5; MW 70.1kDa). 

GBS73-GST was purified as shown in Figure 197, lane 7. 

The GBS73-His fusion product was purified (Figure 103 A) and used to immunise mice (lane 1 product; 
20iig/mouse). The resulting antiserum was used for Western blot (Figure 103B), FACS (Figure 103C ) and 
in the in vivo passive protection assay (Table III). These tests confirm that the protein is immunoaccessible 
on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 69 

A DNA sequence (GBSx0069) was identified in S.agalactiae <SEQ ID 223> which encodes the amino acid 
sequence <SEQ ID 224>. This protein is predicted to be cell division protein FtsZ (ftsz). Analysis of this 
protein sequence reveals the following: 

Possible site: 56 
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117 - 133 ( 117 - 133) 

5 Final Results 

bacterial membrane --- Certainty=0 . 1786 (Affirmative) c suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

10 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC95440 GB:AF068901 cell division protein FtsZ [Streptococcus pneumoniae] 
Identities = 327/426 (76%), Positives = 363/426 (84%), Gaps = 7/426 (1%) 

Query: 1 IWFSFDTASVQGAVIKVIGVGGGGGNAINRMIDEGVAGVEFITaANTDIQALSSSKAETVI 60 
15 M FSFDTA+ QGAVIKVIGVGGGGGNAINRM+DEGv GVEFIAANTD+QALSS+KAETVI 

Sbjct: 1 MTFSFDTAAAQGAVIKVIGVGGGGGNAI1MWDBGVTGVEFIAANTDVQALSSTKAETVI 60 

Query: 61 QLGPKLTRGLGAGGQPEVGRKAAEESEEVLTEALTGADMVFITAGMGGGSGTGAAPVIAR 120 
QLGPKLTRGLGAGGQPEVGRKAAEESEE LTEA++GADMVFITAGMGGGSGTGAAPVIAR 
20 Sbjct: 61 QLGPKLTRGLGAGGQPEVGRKAAEESEETLTEA1SGADMVFITAGMGGGSGTGAAPVIAR 120 

Query: 121 IAKSLGALTVAVITRPFGFEGNKRSNFAIEGIQELREQVDTLLIISNNNLLEIVDKKTPL 180 

1AK LGALTV V+TRPFGFEG+KR FA+EGI +LRE VDTLLIISNNNLLEIVDKKTPL 
Sbjct: 121 1AKDLGALTVGWTRPFGFEGSKRGQFAVEGINQLREHVDTLLIISNNNLLEIVDKKTPL 180 

25 

Query: 181 LEALSEADNVLRCGVQGITDLITNPGLINLDFADVKTVMANKGNALMGIGIGSGEERITE 240 
Sbjct: 

30 Query: 241 AARKAiySPLLETTIDGAEDVIVNVTGGMDMTLTEAEEASEIVSQAAGKGVNIWLGTSID 300 

AARKAIYSPLLETTIDGAEDVIVNVTGG+D+TL EAEEAS+IV+QAAG+GVNIWLGTSID 
Sbjct: 241 AARKAIYSPLLETTIDGAEDVIVNVTGGLDLTLIEAEEASQrVNQAAGQGVNIWLGTSID 300 

Query: 301 MDMKDEIRVTWATGVRIODICrNQVSGFTTSAPTNQAPSERQSTSNSNFDRRGNFDMTESR 360 
35 M+DEIRVTWATGVR+D+ +V + TN + + + S+ FDR +FDM E+ 

Sbjct: 301 ESMRDEIRVTVVATGVRQDRVEI<VVAPQARSATNYRETVKPAHSH-GFDR--HFDMAETA 357 



Query: 361 EMPTQQNQPHAQNQQQSSAFGNWDLRRDNISRPTEGELDSKLSMSTFSENDDMDDELETP 420 

E+P Q P Q+SAFG+WDLRR+-J-I R T+ + D +DEL+TP 

Sbjct: 358 ELPKQ- -NPRRLEPTQASAFGDWDLRRESIVRTTDSWSPVERFEAPISQD- -EDELDTP 413 

Query: 421 PFFKNR 426 

PFFKNR 
Sbjct: 414 PFFKNR 419 

A related DNA sequence was identified in S.pyogenes <SEQ ID 225> which encodes the amino acid 
sequence <SEQ ID 226>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

i cleavable N-term signal seq. 

membrane 117 - 133 ( 117 - 133) 

Final Results 

bacterial membrane Certainty=0. 1723 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 372/439 (84%), Positives = 391/439 (88%), Gaps = 13/439 (2%) 

Query: 1 ^WFSFDTASVQGAVIKVIGVGGGGGNAINRMIDEGVAGvEFIAANTDIQALSSSKAETVI 60 

M FSFDTAS+Q/3A+IKVIGVGG2GGNAINRI4IDEGVAGVEFIAANTDIQALSSSKAETVI 
Sbjct: 1 ^FSFDTASIQGAIIKVIGVGGGGGI^INRMIDEGVAGVEFIAANTDIQALSSSKAETVI 60 
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Query: 












Sb 3 ct: 




IT 








Query: 




Sbjct: 








Sbjct: 






408 


Sbjct: 


421 



QLGPKLTRGLGAGGQPEVGRKAAEESEE+LTEALTGADWFITAGMGGGSGTGAAPVIAR 



LEALSEADNVLRQGVQGIXDilT+FGLINLDFADVKTVMANKGNALMGIGIGSGEERI E 



AARKAIYSPLLETTIDGA+DVIVNOTGG+DMTLTEAEEASEIV QAAG+GVNIWLGTSID 



MDMKDEIRVTWATGVRKDKTNQVSGF TTSAPTN QAPSERQSTSNSNFD 349 

MKD+IRVTWATGVR4-+K QVSGF T TN A + + + FD 

DTMKDDIRVTVVATGVRQEKAEQVSGFRQPRTFTQTNAQQVAGAQYASDQAKQSVQPGFD 360 



! NDD DDELETPPFFKMR 



SEQ ID 224 (GBS163) was expressed in E.coli as a Ilis-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 28 (lane 7; MW 44kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 34 (lane 4; MW 69kDa). 

The GBS163-GST fusion product was purified (Figure 114A; see also Figure 198, lane 11) and used to 
immunise mice (lane 1 product; 20ug/mouse). The resulting antiserum was used for Western blot (Figure 
114B), FACS and in the in vivo passive protection assay (Table III). These tests confirm that the protein is 
immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 70 

A DNA sequence (GBSx0070) was identified in S.agalactiae <SEQ ID 227> which encodes the amino acid 
40 sequence <SEQ ID 228>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>» Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 2750 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the GENPEPT database: 



Query: 2 MNLQENKTAIFDNVSKLALI<AGRAHESVHIVAVTKYVNCQTTEALIRTGVNHIGENRVDK 61 

MN++EN +F V++ +L A R SV ++AVTKYV+ T EAL+ GV+HIGENRVDK 
Sbjct: 1 ^INVKENTELVFREVAEASLSAHRESGSVSVIAVTKYVDVPTAEALLPLGVHHIGENRVDK 60 
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Query: 62 FLEKYQALKDEKLTWHLIGS]^RRKW)\^^TVDYFHM J DSVKLAMIQKHAQKLIKCFL 121 

FLEKY+ALKD +TWHLIG+LQRRKVKDVI YVDYFHALDSVKLA EIQK + ++IKCFL 
Sbjct: 61 FLEKYEALKDRDVTWHLIGTLQRRKVKDVIQYVDYFHALDSVKLAGEIQKRSDRVIKCFL 120 

Query: 122 QVNISREDSKHGFTIEQIDDALNLISRYDKIELIGIMTMAPLKATKEEISSIFEETESLR 181 

QVNIS+E+SKHGF+ E++ + L ++R DKIE +G+MTMAP +A+ E++ IF+ + L+ 
Sbjct: 121 QVNISKEESKHGFSREELLEILPELARLDKIEYVGLMTmPFEASSEQLKEIFKAAQDLQ 180 

Query: 182 KRLQARNIERMPFTEIjSMGMSRDYDIAIONGSTFVRIGTSFFK 224 

+ +Q + I MP TELSMGMSRDY AIQ GSTFVRIGTSFFK 
Sbjct: 181 REIQEKQIPNMPMTELSMGMSRDYKEAIQFGSTFVRIGTSFFK 223 

A related DNA sequence was identified in S.pyogenes <SEQ ID 229> which encodes the i 
sequence <SEQ ID 230>. Analysis of this protein sequence reveals the following: 
Possible site: 20 



Seems to have no N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 2451 (Affirmative) ■ 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < i 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < ( 



25 An alignment of the GAS and GBS proteins is shown below: 

Identities - 133/222 (59%) , Positives - 164/222 (72%) 



Query: 




MNLQENKTAIFDNVSKLMjKAGRAHESWIVAVTKYWCQTTEALIRTGVNKIGENRVDK 


61 






M+L NK IF+ + A R ++SV ++AVTKYV+ LI G+ HI ENRVDK 




Sbjct: 


1 


MDLLTNKKKIFETIRLSTEAANRTNDSVSVIAVTKYVDSTIAGQLIK^GIEHIAENRVDK 60 




62 


FLEKXQALKDEKLTTOLIGSLQRRKVKDVINYvDYFHALDSvKIAAEIQKHAQKIjIKCFL 


121 






FLEKY ALK + WHLIG+LQRRKVK+VTNYVDYFHMjDSV+LA EI K A +KCFL 




Sbjct: 


61 


FLEKYDALKYMPVKWHLIGTLQRRKVKEVINYVDYFHALDSWIALEINKRADHPVKCFL 


120 




122 


QVNISREDSKHGFTIEQIDDALNLISRYDKIELIGIMTMAPLKATKEEISSIFEETESLR 


181 






QVNIS+E+SKHGF I +ID+A+ I + +KI+L+G+MTMAP A+KE I +IF + LR 




Sbjct: 


121 


QWISKEESKHGFNISEIDFAIGEIGKMEKIQLVGLMTMAPANASKESIITIFRQANQLR 


180 




182 


KRLQARNIERMPFTELSMGMSRDYDIAIQNGSTFVRIGTSFF 223 








K LQ + + MPFTELSMGMS DY IAIQ GSTF+RIG +FF 




Sbjct: 


181 


KNLQLKKRKNMPFTELSMGMSNDYPIAIQEGSTFIKIGRAFF 222 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
45 vaccines or diagnostics. 

Example 71 

A DNA sequence (GBSx0071) was identified in S.agalactiae <SEQ ED 231> which encodes the amino acid 
sequence <SEQ ID 232>. This protein is predicted to be YlmF. Analysis of this protein sequence reveals 
the following: 



>>> Seems to have no N-terminal signal sequence 



Final Results 

55 bacterial cytoplasm Certainty=0. 2194 (Affirmative) < succ; 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9617> which encodes amino acid sequence <SEQ ID 961 8> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



5 MALKDRFDKIISYFDTDDVSENEVKEVQERTSVQRDSRAATAQEASQRSHMTNSAEEEMI 64 

M+LKDRFD+ I YF T+D 4 +E +RD T+ +SQ + + 4- 

1 MSLKDRFDRFIDYF-TEDEDSSLPYE KRDEPVFTSVNSSQEPALPMNQPSQSA 52 

65 GSRPRTYTYDPNRQERQRVQRDNAYQQATPRVQNKDSVRQQREQVTIALKYPRKYEDAQE 124 

G++ T RQ+ + N Q+AT ++V I ++YPRKYEDA E 

53 GTKENNITRLHARQQ ELANQSQRAT DKVI IDVRYPRKYEDATE 95 



125 IVDLLIVNEC^IDFQYMLDAQARRCmYIDGASRVLYGSLQKVGESMFLLTPANVMVDI 184 

IVDLL NE +LIDFQYM + QARRCLDY+DGA VL G+L+KV S+M+LLTP NV+V++ 
96 IVDLJ^GNESILIDFQYMTEVQARRCLDYLDGACITTIAGNLKKVASTMYLLTPVIIVIVNV 155 



Sbjct; 
Query: 
Sbjct: 
Query. 
Sbjct: 

A related DNA sequence was identified in S. pyogenes <SEQ ID 23 3> which encodes the amino acid 

sequence <SEQ ID 234>. Analysis of this protein sequence reveals the following: 

Possible site: 49 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.64 Transmembrane 142 - 158 ( 142 - 158) 

Final Results 

bacterial membrane — Certainty=0. 1256 (Affirmative) < auco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < auco 

The protein has homology with the following sequences in the databases: 

>GP:AAC95442 GB:AF068901 YlmF [Streptococcus pneumoniae] 
Identities = 82/219 (37%) , Positives = 113/219 (51%) , Gaps = 46/219 (21%) 

Query: 5 MAFKDTFNKMISYFDTDEVNEVEEDVAASTDNV1P- -RSQQSVRASSHPKQEPRNNHVQQ 62 

M+ KD F++ I YF DE D+ +P + + V S + QEP Q 

Sbjct: 1 MSLKDRFDRF IDYFTEDE DSSLPYEKRDEPVFTSVNSSQEPALPMNQP 48 

Query: 63 DHQARSQEQTRSQMHPKHGTSERYYQQSQPKEGHEMTORRKRMSTSSIANRREQYQQSTC 122 

A ++E +++H + +AN Q 
Sbjct: 49 SQSAGTKENNITRLHARQ QELAN QSQRA 76 

Query: 123 SDQTTIALKYPRKYEDAQEIVDLLIVNECVLIDFQFMLDAQARRCLDFIDGASKVLYGSL 182 

+D+ I ++YPRKYEDA EIVDLL NE +LIDFQ+M + QARRCI1D++DGA VL G+L 
Sbjct: 77 TDKVI IDVRYPRKYEDATE I VDLLAGNES I L I DFQYMTEVQARRCLDYLDGACHVIjAGNL 136 

Query: 183 QEWGSSMYLLAPSNVSVNIEEMTIPHTTQDIGFDFDMKR 221 

+KV S+MYLL P NV VN+E++ +P Q F FDMKR 
Sbjct: 137 KKVASTMYLLTPVNVIVNVEDIRLPDEDQQGEFGFDMKR 175 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 118/222 (53%) , Positives = 145/222 (65%) , Gaps = 17/222 (7%) 

Query: 1 MEGNMALKDRFDKIISYFDTDDVSENEVHEVQERTSV QRDSRAATAQEAS 50 

ME MA KD F+K+ISYFDTD+V+E E +V Q+ R&++ + 

Sbjct: 1 MENKMAFKDTFNKMISYFDTDEVNEVEEDVAASTDNVIPRSQQSVRASSHPKQEPRNNRV 60 

Query: 51 QRSHMTNSAEEEMIGSRPRTYTYDPNRQERQRVQR DNAYQQATPRVQNKDSVRQQR 106 
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Query: 107 EQVT IALKYPRKYEDAQE I VDLL I WE CVL I D FQYMLDAQARRCLDYIDGASRVLYG 163 

+Q TIALKYPRKYEDAQEIVDLLIWEC^IDFQ+MLDAQARRCLD+IDGAS+VLYG 
Sbjct: 121 TCSDQTTIALKYPRKyEDAQEIVDLLIVWECVLIDFQFMLDAQARRCLDFIDGASKVLYG 180 

Query: 164 SLQKVGSSMFLLTPANVMVDIEEMWIPKTGQETSFDFDMKRR 205 

SLQKVGSSM+LL P+NV V+IEEM IP T Q+ FDFDMKRR 
Sbjct: 181 SLQKVGSSMYLLAPSNVSVNIEEMTIPHTTQDIGFDFDMKRR 222 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 72 

A DNA sequence (GBSx0072) was identified in S.agalactiae <SEQ ID 23 5> which encodes the amino acid 
sequence <SEQ ID 236>. This protein is predicted to be YlmH. Analysis of this protein sequence reveals 
the following: 

Possible site: 35 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3956 (Affirmative) < succ 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



IYQHFRPEEYAFIHKIDHLAQYVENTYSFITTEFLNPREFKILESVI,ERRGSHYYTSGQY 65 
IYQHF E+ F+ K + VE++Y+ T F+NP + K+L+ + + G +SG++ 

IYQHFSIEDRPFLDKGMEWIKKVEDSYAPFLTPFINPHQEKLLKILAKTYGLACSSSGEF 64 







Sbjct: 


5 




66 


Sbjct: 




Query: 


126 


Sbjct: 


125 




186 


Sbjct: 


185 


Query: 


246 


Sbjct: 


245 



ILV+E AQ++++ Q + KIG V L E P + 



LD +L4 +LK+SR Q4- +LIE V+VNY V++ + GDLISVR +GR L 



TK +K K+ V 



A related DNA sequence was identified in S.pyogenes <SEQ ID 237> which encodes the amino acid 
sequence <SEQ ID 238>. Analysis of this protein sequence reveals the following: 



Possible site: 56 
» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.69 Transmembrane 46 - 62 ( 46 - 

Final Results 

bacterial membrane Certainty=0. 1277 (Affirmative) 

bacterial outside Certainty=0 . 00 0 0 (Not Clear) < . 
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bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



Ident: 


Lties 


Query: 


7 


Sbjct: 


5 


Query: 
Sbjct: 


57 
65 


Query: 


127 


Sb j ct : 


125 


Query: 


187 


Sbjct: 


185 


Query: 


247 


Sbjct: 


245 



IYQHFHQEKYPFIDRNSDMINRVEDYYLLEVTEFLNPREVMILKSLIALTDLKMBVSTDY 6 6 
IYQHF E+ PF+D+- + I +VED Y +T F+NP + +LK L L S ++ 

IYQHFSIEDRPFLDKGMEWIKKVEDSYAPFLTPFINPHQEKLLKILAKTYGLACSSSGEF 64 



SEY RV++ P Y+ E SDF+I+L EI Y KF LTH++ILGT+IN+LG++R LFGD 



h +LK SR Q LIE ++VNY V +K+ + +GD++S+R GR LL D G 



TK K+KIT+ ++ K 
QTKKEKKKITVQLLLSK 261 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 123/256 (48%) , Positives = 177/256 (69%) 

IYQHFRPEEYAFIHKIDHIAQYVENTYSFITTEFLNPREFKILESVLERRGSHYYTSGQY 6 5 
IYQHF EEY FI ++ + VE+ Y TEFLNPRE IL+S++ + S Y 

IYQHFHQEEYPFIDRMSDMimVEDYYLLEVTEFLNPREVMILKSI.IALTDLKMFVSTDY 6 6 

FQTEYVKVI I APEYYQLDMADFNLSLIE I KYNAKFNHLTHAKIMGTLLNYLGVKRS I LGD 125 
+ +EY +VIIAP YY L+ +DF ++L+EI Y AKFN LTH++I+GTL+N LGVKR++ GD 



I- ++TKI SV+L EV +L+ + Q L ++ SS R 



LD ++ATILK SRTQ LIEA+K+KVNY N+ S+ LV GD++S+RG+GRFTI, 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 73 

A DNA sequence (GBSx0073) was identified in S.agalactiae <SEQ ID 239> which encodes the amino acid 
sequence <SEQ ID 240>. This protein is predicted to be cell division protein DivIVA (septumplacement). 
Analysis of this protein sequence reveals the following: 
Possible site: 14 

>» Seems to have no N-terminal signal sequence 
Final Results 



Query: 


6 


Sb j ct : 


7 


Query: 


66 


Sb j ct : 


67 


Query: 


126 


Sb j ct : 


127 


Query: 


186 


Sb j ct : 


187 


Query: 




Sbjct: 


247 
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bacterial cytoplasm Certainty=0. 5418 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC95445 GB:AF068901 cell division protein DivIVA [Streptococcus pneumoniae] 
Identities = 132/227 (58%), Positives = 179/227 (78%), Gaps = 2/227 (0%) 

Query: 1 MPLTALEIKDKTFSSKFRGYSEEEVNEFLEIVUDDYEDLIRRNREQEQYIKDLEEKIAYF 60 

MP+T+LEIKDKTF ++FRG+ EEV+EFL+IW DYEDL+R N ++ IK LEE+++YF 
Sbjct: 1 MPITSLEIKDKTFGTRFRGFDPEEVDEFLDIVURDYEDLTOANHDKNLRIKSLEERLSYF 60 

Query: 61 NEMKESLSQSVILAQETAERVKISAQDEASNLMGKATFDAQHLIDEAKLKANQILRDATD 120 
+E+K+SLSQSV++AQ+TAERVK +A + ++N++ +A DAQ L++EAK KAN+ILR ATD 

CEILRQATD 120 



Query: 121 DAKRVAIETEDLKRQSRVFHQRLLSELEGQLKIANSSAWEELIjKPTAIYLQNSDASFKEV 180 

+AK+VA+ETE+LK +SRVFHQRL S +E QL + SS WE++L+PTA YLQ SD +FKEV 
Sbjct: 121 NAKKVAVETEELKNKSRVFHQRLKSTIESQLAIVESSDWEDILRPTATYLQTSDEAFKEV 180 

V +VL E P+ + E D TRQFS EM ELQ R+3 ++K+L E 
Sbjct: 181 VSEVLGEPIPAPI--EEEPIDMTRQFSQAEMAELQARI3VADKELSE 225 

A related DNA sequence was identified in S.pyogenes <SEQ ID 24 1> which encodes the amino acid 
sequence <SEQ ID 242>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 6272 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 180/254 (70%) , Positives = 217/254 (84%) , Gaps = 2/254 (0%) 

MPLTALEIKDKTFSSKFRGYSEEEVNEFLEIWDDYEDLIRRNREQEQYIKDLEEKIAYF 60 
M LT LEIKDKTF +KFRGY EEEVNEFL+IWDDYE L+R+NR+ E IKDLEEK++YF 
MALTTLEIKDKTFKTKFRGYCEEEVNEFIXIIVvDDYEALTOKNRDNEARIKDLEEKLSYF 6 0 

NEMKESLSQSVILAQETAERVKISAQDEASNLMGKATFDAQHLIDEAKLKANQILRDATD 120 
+EMKESLSQSVILAQETAE+VK +A EA+NL+ KAT+DAQHL+DE+K KANQ+LRDATD 
DEMKESLSQSVILAQETAEKVKATANAEATNLVSKATYDAQHLLDESKAKANQMLRDATD 120 

DAKRVAIETEDLKRQSRVFHQRLLSELEGQLKLANSSAWEELLKPTAIYLQNSDASFKEV 180 
+AKRVAIETE+LKRQ+RVFHQRL+S +E QL L+NS W+ELL+ PTAI YLQNSD +FKEV 





1 


Sb j ct : 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 




241 


Sbjct: 


239 



V+ VL+ED +P DD+ S FDATRQF+ P +E +EELQRRV+ESNK+LE 



+NL ETQTFKLNI 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
60 vaccines or diagnostics. 
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Example 74 

A DNA sequence (GBSx0074) was identified in S.agalactiae <SEQ ID 243> which encodes the amino acid 
sequence <SEQ ID 244>. Analysis of this protein sequence reveals the following: 

Possible site: 61 



Final Results 

bacterial membrane --- Certainty=0. 1171 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC95446 GB:AF068901 isoleucine-tRNA synthetase [Streptococcus pneumoniae] 
Identities = 730/929 (78%) , Positives = 822/929 (87%) , Gaps = 1/929 (0%) 

Query: 1 MKLKETLNLGQTAFPMRAGLPNKEPQWQB^WXJADIYKKRQAIiNEGKPAFHLHDGPPYAN 60 

MKLK+TLNLG+T FPMRAGLP KEP WQ+ W+ A +Y++RQ LN+GKP F LHDGPPYAN 
Sbjct: 1 MKLKDTLNLGKTEFPMRAGLPTKEPVWQKEWEDAKIiYQRRQEIjNQGKPHFTLHDGPPYAN 60 

Query: 61 GNIHVGHALNKISKDIIVRSKSMSGFRAPYVPG1TOTHGLPIEQVLAKKGVKRKEMDLAEY 120 

GNIHVGHA+NKISKDIIVRSKSKSGF AF+ + FGIOTHGLPIEQVL+K+GVKRKEMDL EY 
Sbjct: 61 GNIHVGHAMNKISKDIIWSKSMSGFYAPFIPGTOTHGLPIEQV1SKCGVKRKEMDLVEY 120 

Query: 121 LEMC3U5YALSCJVDKQRDDFKRLGVSADWFJMPYITLTPDYEADQWVFGAMADKGYIYRGA 180 

L++CR+YALSQVDKQR+DFKRLGVS DWENPY+TLTPDYEA Q+RVFG MA+KGYIYRGA 
Sbjct: 121 LKLCREYALSQVDKQREDFI<RLGVSGDWENPYVTLTPDYERAQIRVFGEMANKGYIYRGA 180 

Query: 1B1 KPVYWSWSSESALAEAEIEYHDIDSTSLYYANKVKDGKGILDTrOTIVVWTTPFTVTAS 240 

KPVYWSWSSESAIAEAEIEYHD+ STSLYYANKVKDGKG+LDTDTYIWWTTTPFT+TAS 
Sbjct: 181 KPVYWSWSSESAIAEAEIEYHDLVSTSLYYANKVKDGKGVLDTDTYIVVWTTTPFTITAS 240 

Query: 241 RGLWGPDMEYVVWPVGSERKYLI^VLVDSIAAKFGWENFEIVTHHTGKELNHIVTEH 300 

RGLTVG D++YV+V PVG RK+++A L+ SIi+ KFGW + +++ + G+EIOTIVTEH 
Sbjct: 241 RGLWGADIDYVIjVQPVGFARKFWAAELLTSIiSEKFGWADVQVLETYRGQEIiNHI'VTEH 300 

Query: 301 PWDTEVEELVILGDHVTTDSGTGI VHTAPGFGEDDYNVGIAWGLDVWTVDSRGLMMENA 360 

PWDT VEELVILGDHVTTDSGTGIVHTAPGFGEDDYNVGIAN L+V VTVD RG+MM+NA 
Sbjct: 301 PWDTAWELVILGDHOTTDSGTGIVHTAPGFGEDDYNVGIAMLEmVTVDERGIMMKMA 360 

Query: 361 GPDFEGQFYDKOTPLVKEKLGDLLLASEVINHSYPFDWRTKKPIIWRAVPQWFASVSKFR 420 

GP+FEGQFY+KV P V EKLG+LL1A E I+HSYPFOWRTKKPIIWRAVPQWFASVSKFR 
Sbjct: 361 GPEFEGQFYEKWPTVIEKLGNLLLAQEEISHSYPFDWRTKKPIIWRAVPQWFASVSKFR 420 

Query: 421 QEILDEIEIOTFQPEWGKKRLYNMIRDRGDWVISRQRAWGVPLPIFYAEDGTAIMTKEVT 480 

QEILDEIEK F EWGK RLYNMIRDRGDWVISRQR WGVPLPIFYAEDGTAIM E 
Sbjct: 421 QEILDEIEKVKFHSEWGKVRLYNMIRDRGDWVI SRQRTWGVPLPIFYAEDGTAIMVAETI 480 

Query: 481 DHVADLFAEYGSIVWWQRDAKDLLPAGYTHFGSPNGLFEKETDIMDVWFDSGSSWNGVMN 540 
+HVA LF ++GS +WW+RDAKDLLP G+THFGSPNG F+KETDIMDVWFDSGSSWNGV+ 

5FKKETDIMDVWFDSGSSWNGVW 540 



Query: 541 AREmSYPADLYLEGSDQYRGWFNSSLITSVAWGHAPYKAVLSQGFVLDGKGEKMSKSL 600 

R L+YPADLYLEGSDQYRGWFNSSLITSVA +G APYK +LSQGF LDGKGEKMSKSL 
Sbjct: 541 NRPELTYPADLYLEGSDQYRGWFNSSLITSVANHGVAPYKQILSQGFALDGKGEKMSKSL 600 

Query: 601 GOTILPSDVEKQFGAEILRLWVTSVDSSITOTOISMDILKQTSETYRKIRNTLRFLIANTS 660 

GNTI PSDVEKQFGAEILRLWVTSVDSSNDVRISMDIL Q SETYRKIRNTLRFLIANTS 
Sbjct: 601 GNTIAPSDVEKQFGAEILRLWVTSVDSSNDVRISMDILSQVSETYRKIRNTLRFLIANTS 660 

Query: 661 DFNPKQDAVAYFjNLGAVDRYMTIKFNQV^/DTINKAYAAYDFMAIYKAVvNFVTVDLSAFY 720 

DFNP QD VAY+ L +VD+YMTI +FNQ+V TI AYA ++F+ 1YKA+VNF+ VDLSAFY 
Sbjct: 661 DFNPAQDTVAYDELRSVDKYMTIRFNQU7KTIRDAYADFEFLTIYKALVNFINVDLSAFY 720 
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Query: 721 LDFAKDWyiEAaNSPERRRMQTVFYDILVKLTKLLTPILPHTAEEIWSYLEHEEEEFVQ 780 

IiDFAKDWYIE A S ERR+MQTVFYDILVK+TKLLTPILPHTAEEIWSYLE E E+FVQ 
Sbjct: 721 LDFAKDVWIEGAKSLERRQMQTVFYDILVKITKLLTPILPHTAEEIWSYLEFETEDFVQ 780 

Query: 781 LAEMPVAQTFSGQEEILEEWSAFMTLRTQAQKALEEARNAKOTGKSLFAHLTIYASQEVK 840 

L+E+P QTF+ QEEIL+ W+AFM R QAQKALEEARNAKVIGKSLEAHLT+Y ++ VK 
Sbjct: 781 LSELPEVQTFANQEEILDTWAAFMDFRGQAQKALEEARNAKUIGKSLEAHLTVYPNEWK 840 

Query: 841 TLLTALNSDIALLMIVSQLTIADEADKPADSVSFEGVAFTVEHAEGEVCERSRRIDPTTK 900 

TLL A+NS++A L+IVS+LTTA+E P ++SFE VAFTVE A GEVC+R RRIDPTT 
Sbjct: 841 TLLEAWSNVAQLLIVSELTIAEE-PAPEAAljSFEDVAFTVERAAGEVCDRCRRIDPTTA 899 

Query: 901 MRSYGVAVCDASAAI IEQYYPEAVAQGFE 929 

RSY +CD A+I+E+ + +AVA+GFE 
Sbjct: 900 ERSYQAVICDHCASIVEENFADAVAEGFE 928 

A related DNA sequence was identified in S.pyogenes <SEQ ID 245> which encodes the amino acid 
sequence <SEQ ID 246>. Analysis of this protein sequence reveals the following: 

Possible site: 61 
>» Seems to h= 

INTEGRAL Likelihood = -1.70 Transmembrane 849 - 865 ( 848 - 867) 

Final Results 

bacterial membrane Certainty=0. 1680 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 798/929 (85%), Positives = 857/929 (91%) 

MKIiKETimGQTAFPMRAGLPNKEPQWQEAWrXJADIYKKRQALNEGKPAFHLHDGPPYAN 60 
MKLKETLNLG+TAFPMRAGLPNKEPQWQ AW+QA+4YKKRQ LN GKPAFHLHDGPPYAN 
MKLKETLNLGKTAFPMFAGLPNKEPQWQAAWEQAELYKKRQELNAGKPAFHLHDGPPYAN 6 0 







1 


35 


Sbjct: 


1 
61 


40 


Sbjct: 


61 




Query: 


121 




Sbjct: 


121 


45 




181 




Sbjct: 


181 


50 




241 




Sbjct: 


241 






301 


55 


Sbjct: 


301 






3S1 


60 


Sbjct: 


361 




Query: 


421 




Sbjct: 


421 


65 




481 



GNIHVGHALNKISKDIIWSKSMSGF+APYWG5fflTHGLPIEQVLAK+G+KRKEMDLAEY 
GNIHVGHALNKISKDIIVRSKSMSGFQAPYVPGTOTHGLPIEQVLAKQGIKRKEMDLAEY 1 

LEMCRDYALSQVDKQRDDFKRLGVSADWENPYITLTPDYEADQVRVFGAMADKGYIYRGA 1 
LEMCR YALSQVDKQRDDFKRLGVSADWENPY+TL P +EADQ+RVFGAMA+KGYIYRGA 
LEMCRQYALSQVDKQRDDFKRLGVSADWENPY\TLDPQFEADQIRVFGAMAEKGYIYRGA 1 

KPVYWSWSSESAIAEAEIEYHDIDSTSLYYANlOT<DGKGILDTDTYIVVWTTTPFrVTAS 2 
KPVYWSWSSESALAEAEIEYHDIDSTSLYYANKVKDGKGILDT+TYIVVWTTTPFTVTAS 
KPvYWSWSSESAIAEAEIEYHDIDSTSLYYANKVKDGKGILDTNTYIVVWTTTPFTVTAS 2 

RGLOTGPDMEYVVWPVGSERKYLIAEVLVDSIAAKFGWENFEIVTHHTGKEIOTIVTEH 3 
RGLTVGPDM+Y+W P GS+R+Y++AE L+DSLA KFGWE+FE + H G +L +IVTEH 
RGLTVGPDMDYLVVKPAGSDRQYWAEGLLDSLAGKFGWESFETLASHKGADLEYIVTEH 3 

PWDTEVEELVILGDHVTTDSGTGIWTAPGFGEDDYNVGIANGLDVVVTVDSRGLMMENA 3 
PWDT+VEELVILGDHVT +SGTGIVHTAPGFGEDDYNVG L+V VTVD RGLMMENA 
PWDTDVEELVILGDHVTLESGTGIVHTAPGFGEDDYNVGTKYKLEVAVTVDERGLMMENA 3 

GPDFEGQFYDKVTPLVKEKLGDLLl^SEVINHSYPFrWTKKPIIWRAVPQWFASVSKFR 4 
GPDF GQFY+KUTP+V +KLGDLLLA EVINHSYPFDWRTKKPIIWRAVPQWFASVS FR 
GPDFHGQFYNKVTPIVIDKLGDLLLAQEVINHSYPFDWRTKKPI IWPAVPQWFASVSDFR 4 

QEILDEIEKTNFQPEWGKKRLYNMIRDRGDWVISRQRAWGVPLPI FYAEDGTAIMTKEVT 4 
Q+ILDEIEKT F P WG+ RLYNMIRDRGDWVISRQRAWGVPLPIFYAEDGTAIMTKEVT 
QDILDEIEKTTFHPSWGETRLYNMIRDRGDWVISRQRAWGVPLPIFYAEDGTAIMTKEVT 4 

Query: 481 DHVADLFAEYGSIVWWQRDAKDLLPAGYTHPGSPNGLFEKETDIMDVWFDSGSSWNGVMN E 
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Query: 541 ARENLSYPADLYLEGSDQYRGWFNSSLITSVAVNGHAPYKAVLSQGFVLDGKGEKMSKSL 600 

+ENLSYPADLYLEGSDQYRGWFNSSLITSVAVNGHAPYKA+LSQGFVLDGKGEKMSKS 
Sbjct: 541 TKENLSYPADLYLEGSDQYRGWFNSSLITSVAVNGHAPYFAILSQGFVLDGKGEKMSKSK 600 

Query: 601 GNTILPSDVEKQFGAEILRLWVTSVDSSNDVRISMDILKQTSETYRKIRHTLRFLIANTS 660 

GN I P+DV KQ+GA+ 1 LRLWV SVD+ NDVR+SM+IL Q SETYRKIRNTLRFLIANTS 
Sbjct: 601 GNIISPiroVAKQYGADILRLWVASVDTDNDWVSMEILGQVSETYRKIRlITLRFLIANTS 660 

Query: 661 DFMPKQDAVAYENLGAVDRWIKFMQVVDTINKkYAAYDF^IYKAVWFVTVDLSAFY 720 

DFNP D VAY +LG VD+YMTI FNQ+V TI AY YDFMAIYKAVVNFVTVDLSAFY 
Sbjct: 661 DFNPATDTVAYADI^TVDKYMTIVFNQLVATITDAYERYDFmiYKAWNFVTVDLSAFY 720 

Query: 721 LDFAKDWYIEAANSPERRRMQTVFYDILVKLTKLLTPILPHTAEEIWSYLEHEEEEFVQ 780 

LDFAKDWYIEAANS ERRRMQTVFYDILVK+TKLLTPILPHT EEIWSYLEHE E FVQ 
Sbjct: 721 LDFAKDWYIEAANSLERRRMQTVFYDILVKITKLLTPILPHTTEEIWSYLEHESEAFVQ 780 

Query: 781 IiAEMPVAQTFSGQEEILEEWSAFMTLRTQAQKALEEARNAKVIGKSLEAHLTIYASQEVK 840 

LAEMPVA+TFS QE+ILE WSAFMTLRTQAQKALEEARNAK+IGKSLEAHLTIYAS+EVK 
Sbjct: 781 LAEMPVAETFSAQEDILEAWSAFMTLRTQAQKALEEARNAKIIGKSLEAHLTIYASEEVK 840 

Query: 841 TLLTALNSDIALLM1VSQLTIADEADKPADSVSFEGVAFTVEHAEGEVCERSRRIDPTTK 900 

TLLTAL+SDIALL4-IVSQLTIAD AD PAD+V+ FEGVAF VEHA GEVCERSRRIDPTT+ 
Sbjct: 841 TLLTALDSDIALLLIVSQLT1ADLADAPADAVAFEGVAFIVEHAIGEVCERSRRIDPTTR 900 

Query: 901 MRSYGVAVCDASAAI IEQYYPEAVAQGFE 929 

MRSY VCD SA IIE+ +PEAVA+GFE 
Sbjct: 901 MRSYNAFVCDHSAKI IEENFPEAVAEGFS 92S 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 75 

A DNA sequence (GBSx0075) was identified in S.agalactiae <SEQ ID 247> which encodes the amino acid 
sequence <SEQ ID 248>. Analysis of this protein sequence reveals the following: 
Possible site: 39 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3425 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 249> which encodes the amino acid 
sequence <SEQ ID 250>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3467 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 77/99 (77%) , Positives = 89/99 (89%) 
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Query: 1 mLim'TSSHPELWNQLQmTIAKL\^VTSAGigTDWFTKAPKHYELLISNKYRAIKDEE 60 

MRLINTTSSHPEL++NQL+NTDA LVEVYSAGNTDV+FT+APKHYEBLISNICTRAIK++E 
Sbjct: 1 mLim'TSSHPELIFMQLKNTOAYLVEWSAGNTDVIPTQAPKHYELLISNKYRAIKEDE 60 

5 

Query: 61 LEAIREFFLKRKIDQSIIIQEQMKSLHTAKLIEISYPTT 99 

L+ IREFFLKRKID I+I Q K+IiHT LIEIS+ T+ 
Sbjct: 6X LDIIREFFLKRKIDPKIVIPGQSKTLHTHNLIEISFQTS 99 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 76 

A DNA sequence (GBSx0076) was identified in S.agalactiae <SEQ ID 25 1> which encodes the amino acid 
sequence <SEQ ID 252>. This protein is predicted to be AP4A hydrolase. Analysis of this protein sequence 
1 5 reveals the following: 

Possible site: 42 

»> Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0. 1714 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC0S510 GB:AE000676 AP4A hydrolase [Aquifex aeolicus] 
Identities = 30/101 (29%) , Positives = 48/101 (46%) , Gaps = 2/101 (1%) 

Query: 32 KIILVQAPNGAWFLPGGEIEEiraNHLEALTRELIEELGYSATIGHYYGQADEyFYSRHRD 91 
30 +4+L++ P+ W P G IE E E RE+ EE G I Y G+ Y+Y+ + 

Sbjct: 16 EVLLIKTPSNVWSFPKGNIEPGEKPEETAVREVWEETGVKGEILDYIGEI-HYWYTLKGE 74 

Query: 92 TYYYNPAYIYEVTAYHKDQAPLEDFNHLAWFPIQEAKEKLK 132 
+ Y Y + + P + +FPI+EAK+ LK 

35 Sbjct: 75 RIFKTVKY-YLMKYKEGEPRPSWEVKDAKFFPIKEAKKLLK 114 

A related DNA sequence was identified in S.pyogenes <SEQ ID 253> which encodes the amino acid 
sequence <SEQ ID 254>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

40 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1954 (Affirmative) < suco 

45 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 102/149 (68%) , Positives = 118/149 (78%) 

50 

Query: 1 MTNPTFGEKIDNVNYRSRFGVYAIIPNPTHDKIILVQAPNGAWFLPGGEIEENENHLEAL 60 

M PTFG K + +Y +R+GVYAIIPN KIILVQAPNG+WFLPGGEIE E L+AL 
Sbjct: 1 MMIPTFGHKNAHKDYVTRYGVYAIIPNHEQTKIILVQAPNGSWFLPGGEIEAGEGQLQAI, 60 

55 Query: 61 TRELIEELGYSATIGHYYGQADEYFYSRHRDTYYYWPAYIYEVTAYHKDQAPLEDFNHLA 120 

RELIEELG+SATIG YYGQADEYFYSRHRDT++Y+PAY+YEVTA+ PLEDFN+L 
Sbjct: 61 ERELIEELGFSATIGSYYGQADEYFYSRHRDTHFYHPAYLYEVTAFQAVSKPLEDFNNLG 120 
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Query: 121 WFPIQEAKEKLKRGSHRWGVQAWEKNHHS 149 

WF EA KLKR SH+WGV+ W+K HHS 
Sbjct: 121 « 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 77 

A DNA sequence (GBSx0077) was identified in S.agalactiae <SEQ ID 255> which encodes the amino acid 
sequence <SEQ ID 256>. This protein is predicted to be ClpE (clpB-1). Analysis of this protein sequence 
reveals the following: 

Possible site: 54 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .2882 (Affirmative) < suec> 

bacterial membrane Certainty=0 . 0000 (Not Clear) c suco 

bacterial outside Certainty=0 . 0000 (Not Clear) c suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD01782 GB:AF023421 ClpE [Lactococcus lactis] 
Identities = 560/752 (74%), Positives = 647/752 (85%), Gaps = 12/752 (1%) 

Query: 1 MLCQNCKLNESTIHLYTNVNGKQKQVDLCQNCYQI1KTDPNNPLFSGLNHVS-HAPGGIN 59 

MLCQNC +NE+TIHLYT+VNG++EQ+DLCQNCYQI+K+ LF N + ++ N 

Sbjct: 1 MLCQNCNINEATIHLYTSVNGQKKQIDLCQNCTQIMKSGGQEALFGAGNASNGNSDEPFN 60 

Query: 60 PFFDDFFGDIOTFRAFNGQDLPKTPPTQSGGNRGGGNGNGRNNNRNQTATPSQAKGILEE 119 

PF +D F L + FNG TPPTQ+GG G N R Q KG+LEE 

Sbjct: 61 PF-NDIFSALQG-QDFNGAASNQTPPTQTGGRGPRGPQNPR AKQPKGMLEE 109 

Query: 120 FGINVTEIARHGDIDPVIGRDSEIIRVIEILNRRTKNNPVIiIGEPGVGKTAVVEGLAQKI 179 

FGIN+TE AR G+IDPVIGRD EI RVIEILNRRTKNNPVLIGEPGVGKTAWEGLAQKI 
Sbjct: 110 FGINITESARRGEIDPVIGRDEEISRVIEILTOIRTKNNPVLIGEPGVGKTAVVEGLAQKI 169 

Query: 180 VDGNVPHKLQGKQVIRLDWSLVQGTGIRGQFEERMQKLMEEIRQRQDVILFIDEIHEIV 239 

VDG+VP KLQ K+VIRLDWSLVQGTGIRGQFEERMQKLM+EIR+R DVI+FIDEIHEIV 
Sbjct: 170 VDGDVPQKLQNKEVIRLDWSLVQGTGIRGQFEERMQKLMDEIRKRNDVIMFIDEIHEIV 229 

Query: 240 GAGTAGEGS^AGNILKPALARGELQLVGATTLNEYRIIEKDAALERRMQPVKVDEPSVE 299 

GAG+AG+G+MDAGNILKPAI1ARGELQLVGATTI1NEYRIIEKDAALERRMQPVKVDEPSV+ 
Sbjct: 230 GAGSAGDGNMDAGNILKPALARGELQLVGATTLNEYRIIEKDAALERRMQPVICVDEPSVD 289 

Query: 300 ETITILKGIQKKYEDYHHVKYNNDAIFAAAVLSNRYIQDRFLPDI^IDLLDEAGSIO>OTLT 359 

ETITIL+GIQ +YEDYHHVKY 4+AIEAAA LSNRYIQDRFLPDKAIDLLDE+GSK NLT 
Sbjct: 290 ETITILRGIQARYEDYHHVKYTDEAIEAAAHLSNRYIQDRFLPDKAIDLLDESGSKKNLT 349 

Query: 360 LNFYDPKEIDQRLIEAENLKAQATREEDYERAAYFRDQIAICYKEMQQQKVDDQDTPIITE 419 

L FVDP++I++R+ +AE+ K +AT+ ED+E+AA+FRDQI+K +E+Q+Q+V D+D P+ITE 
Sbjct: 350 LKFVDPEDINRRIADAESKKNEATKAEDFEKA&HFRDQISKLRELQKQEVTDEDMPVITE 409 

Query: 420 KTIEHIIEEKTNIPVGDLKEKEQSQLINLADDLKQHVIGQDDAVVKIAKAIRRNRVGLGS 479 

K IE I+E+KT IPVGDLKEKEQ+QLINIADDLK HVIGQD+AV KI+KAIRR+RVGLG 
Sbjct: 410 KDIEQIVEQKTQIPVGDLKEKEQTQLINLADDLKAHVIGQDEAVDKISKAIRRSRVGLGK 469 

Query: 480 PNRPIGSFLFVGPTGVGKTELSKQLAIELFGSADSMIRFDMSEYMEKHAVAKLVGAPPGY 539 

PNRPIG FLFVGPTGVGKTEL+KQLA ELFGS-r+SMIRFDMSEYMEKH+VAKL+GAPPGY 
Sbjct: 470 PNRPIGFFLFVGPTGVGKTELAKQLAKELFGSSESMIRFBMSEYMEKHSVAKLIGAPPGY 529 



60 Query: 540 VGYEEAGQLTEKVRRNPYSLILLDEIEKAHPDVMHMFLQVLDDGRLTDGQGRTVSFKDTI 599 

VGYEEAGQLTE+VRRNPYSLILLDEIEKAHPDVMHMFLQ+L+DGRDTD QGRTVSFKD++ 
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Sbjct: 530 VGYEEAGQLTERVREHPYSLILLDEIEKAHPDVMHMFLQILEDGRLTDAQGRTVSFKDSL 589 

Query: 600 I IMTSWAGSGKTEASVGFGASREGRTNSVLGQLGNFFSPEFMNRFDGI IEPKALDKENLL 659 

IIMTSNAG+GK EASVGFGA+REGRT SVLGQLG+FFSPEFMNRFDGIIEF AL KENLL 
Sbjct f 590 IIMTS]mGTGCTl^VGFGAAREGRTKSVlGQLGDFFSPEF^IWRFDGIIEFSALSKElJLL 649 

Query: 660 NIVDIMLSDWARIAINGIHLDVTDKV1<EKLVDIjGYI)PKMGARPLRRTIQEHIEDAITDY 719 

IVD+ML +VN ++ N IHL VT KEKLVDLGY+P MGARPLRR IQE+IED+I D+ 
Sbjct: 650 KIVDLMLDEVNEQIGRNDIHLSVTQAAKEKLVDI^YNPAMGARPLRRIIQEMIEDSIADF 709 

Query: 720 YLENPSEKELRAIMTSNGNIIIKSSKKTEEST 751 

Y+E+P K+L A + + +1 +++T E+T 
Sbjct: 710 YIEHPEYKQLVADLIDDKIVISNQTQETAETT 741 

A related DNA sequence was identified in S.pyogenes <SEQ ID 257> which encodes the amino acid 
sequence <SEQ ID 25 8>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>» Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3104 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 640/751 (85%), Positives = 691/751 (91%), Gaps = 7/751 (0%) 

MLCQNCKLNESTIHLYTl^GKQKQVDLCQNCTQIIKTDPNNPLFSGIiNHVSHAPG-GIN 59 
MLCQNC LNESTIHLYT+VNGKQ+QVDLCQNCYQI+K+DP N + +GL A + 



Query: 


1 


Sbjct: 


1 




60 


Sbjct: 


61 




116 


Sbjct: 


119 


Query: 


176 


Sbjct: 


179 


Query: 


236 


Sbjct: 


239 


Query: 


296 


Sbjct: 


29S 


Query: 


356 


Sbjct: 


359 




416 


Sbjct: 


419 


Query: 


476 


Sbjct: 


479 


Query: 


536 



PFtIJDFb\3DLNNFRAF 



+LEEFGINVT+IAR+G+IDPVIGRD E 



AQKI+DG VP KLQGKQVIRLDWSLVQGTGIRGQFEERMQKLMEEIR R+DVILFIDEI 



PSVEETITILKGIQ KYEDYHHVKY+ AIEAAA LSNRYIQDRFLPDKAIDLLDEAGSK 



MNLTLNFVDPKEID+RLIEAENLKaQATR+EDYERAAYFRDQI KYKEMQ QKVD+QD P 



IITEKTIE I+E+KTNIPVGDLKEKEQSQL+RLA+DLK HVIGQDDAV KIAKAIRRNRV 



Query: 536 PPGYVGYEEAGQLTEKVRRNPYSLIi DE ~ Jj' r ! ' I -DrCRLTDGQGRTVSF 595 
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PPGY+GYEEAGQLTE+VRRNPYSLILLDE+EKAHPDVMHMFLQVLDDGRLTDGQGRTVSF 
Sbjct: 539 PPGYIGYEEAGQLTEQVRRNPYSLILLDEVEKAHPDVMHMFLQVLDDGRLTDGQGRTVSF 598 

Query: 596 KDTIIIMTSNAGSGKTFASVGFGASREGRTNSVLGQLGNFFSPEFMNRFDGIIEFKALDK 655 
5 KDTIIIMTSNAG+GK+EASVGFGA+REGRT+SVLG-l-L NFFSPEFMNRFDGIIEFKAL K 

Sb j ct : 599 KDTI I IMTSNAGTGKSEASVGFGAAREGRTSSVLGELSNFFSPEFMNRFDGI IEFKALSK 658 

Query: 656 El^miVDIMLSDVNARLAINGIHLDVTDKVKEKLVDLGYDPKMGARPLRRTIQEHIEDA 715 
E+LL+IVD+ML DVN RL NGIHLDVT KVKEKLVDLGYDPKMGARPLRRTIQ++IEDA 
10 Sbjct: 659 EHLLHIVDLMLEDVNERLGYNGIHLDVTQKVKEKLVDLGYDPKMGARPLRRTIQDYIEDA 718 

Query: 716 ITDYYLENPSEKELRAIMTSNGNIIIKSSKK 746 

ITDYYLE+P+EK+LRA+MT++ NI IK+ K+ 
Sbjct: 719 ITDYYLEHPTEKQLRALMTNSENITIKAVKE 749 

15 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 78 

A DNA sequence (GBSx0078) was identified in S.agalactiae <SEQ ID 259> which encodes the amino acid 
20 sequence <SEQ ID 260>. This protein is predicted to be glutamine ABC transporter, permease protein 
(glnP). Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have an uncleavable N-term signal seq 
25 INTEGRAL Likelihood = -9.92 Transmembrane 27 - 43 ( 15 - 46) 

INTEGRAL Likelihood = -2.50 Transmembrane 200 - 216 ( 195 - 217) 

Final Results 

bacterial membrane — Certainty=0. 4970 (Affirmative) < suoo> 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9619> which encodes amino acid sequence <SEQ ID 9620> 
was also identified. 



35 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB91000 GB:AE0010S0 glutamine ABC transporter, permease protein 
(glnP) [Archaeoglobus fulgidus] 
Identities = 92/209 (44%) , Positives = 129/209 (61%) , Gaps = 10/209 (4%) 



Sbjct: 
Query 
Sbjct 
Query 
Sbj ct 
Query: 
Sbjct 



17 YGVMVTIMISTCVVFFGTIIGVLIALVKRTNLHFLTIIANFYVVJVFRGTPMVVQIMIAFA 76 

+G VT+ ++ +FFG IIG + L + + ++ YV V RGTP++VQI+I + 

21 FGASVTLKLTLISIFFGLIIGTIAGLGRVSKNPLPFAISTAYVEVIRGTPLLVQILIVYF 80 

77 WMHFNNLPTISFGVLDLDFTRLLPGIIIISLNSGAYISEIVRAGIEAVPSGQIEAAYSLG 13 6 

LP I + Gil +S+ SGAYI+EIVRAGIE++P GQ+EAA SLG 

81 GLPAIGINLQPEP AGIIALSICSGAYIAEXVRAGIESIPIGQMEAARSLG 130 

137 IRPKNTLRYVILPQAFKNILPALGNEFITIIKDSALLQTIGVMELWNGAQSVVTATYSPV 196 

+ +RYVI PQAF+NILPALGNEFI ++KDS+LL I ++EL + +V T++ 

131 MTYLQAMRYVI FPQAFRNI LPALGNEFIALLKDSSLLSVI S I VELTRVGRQIVNTTFNAW 190 

197 APLLFAAFYYLMLTTILSALLKQMEKYLG 225 

P L A +YLM+T LS L+ +K LG 
191 TPFLGVALFYLMMTIPLSRLVAYSQKKLG 219 



A related DNA sequence was identified in S.pyogenes <SEQ ID 26 1> which encodes the amino acid 
sequence <SEQ ID 262>. Analysis of this protein sequence reveals the following: 

Possible site: 30 
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»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.08 Transmembrane 25 - 41 ( 11 - 44) 
INTEGRAL Likelihood = -1.91 Transmembrane 202 - 218 ( 201 - 218) 

Final Results 

bacterial membrane — Certainty=0 .4630 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAB91000 GB:AE001090 glutamine ABC transporter, permease protein 
(glnP) [Archaeoglobus fulgidus] 
Identities = 91/209 (43%), Positives = 138/209 (65%), Gaps = 12/209 (5%) 

Query: 15 YGvLVTIMISVSWFFGTLIGVLVTLIKRSHVKPLTWVVNL-YvWIFRGTPMVVQIMIAF 73 

+G VT+ +++ +FFG +IG + L + S PL + ++ YV + RGTP++VQI+I + 
Sbjct: 21 FGASVTLKLTLISIFFGLIIGTIAGLGRVSK-NPLPFAISTAYVEVIRGTPLLVQILIVY 79 

Query: 74 AWMHFNNMPTIGFGVLDLDFSRLLPGIIIISLNSGAYISEIVRAGIEAVPKGQLEAAYSL 133 

+P IG ++ Gil +S+ SGAYI+EIVRAGIE++P GQ+EAA SL 

Sbjct: 80 F GLPAIG INLQPEPAGIIALSICSGAYIAEIVRAGIESIPIGQMEAARSL 129 

Sbjct: 

Query: 194 ISPLLVAAFYYLMVTTVMAQLLAVLERHM 222 

+P L A +YLM+T +++L+A ++ + 
Sbjct: 190 WTPFLGVALFYLMMTIPLSRLVAYSQKKL 218 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 1B0/225 (80%) , Positives = 208/225 (92%) 

Query: 3 MNFSFLPQYWSYFNYGVTWTIMISTCTA/FFGTIIGVLIAJLVKRTNLHFLTILANFYVWV'F 62 

M+ SFLP+YW+YFNYGV+VTIMIS WFFGT+IGVL+ L+KR+++ LT + N YVW+F 
Sbjct: 1 MDLS FLPKYWAYFNYGVLVT IMI S VS WFFGTL I GVLVTL I KRSHVKPLTWWNLYVWI F 60 

Query: 63 RGTPMWQIMIAFAWMHFNNLPTISFGVLDLDFTRLLPGIIIISLNSGAYISEIvRAGIE 122 

RGTPMWQIMIAFAWMHFNN+PTI FGVLDLDF+RLLPGI 1 1 ISLNSGAYISEI VRAGIE 
Sbjct: 61 RGTPMWQIMIAFAt-MHFNNMPTIGFGVLDLDFSRLLPGIIIISLNSGAYISEIvRAGIE 120 

Query: 123 AVPSGQIEAAYSLGIRPKNTLRYVILPQAFKNILPALGKEFITIIKDSALLQTIGVMELW 182 

AVP GQ+EAAYSLGIRP+N +RYVILPQAFKNILPALGNEFITIIKDSALLQTIGVMELW 
Sbjct: 121 AVPKGQLEAAYSLGIRPQNAMRYVILPQAFKNILPALGNEFITIIKDSALLQTIGVMELW 180 

Query: 183 NGAQSWTATYSPVAPLLFAAFYYLMLTTILSALIiKQMEKYLGKG 227 

NGAQSWTATYSP++PLL AAFYYLM+TT+++ LL +E+++ +G 
Sbjct: 181 NGAQSVVTATYSPISPLLVAAFYYLMvTTVMAQLLAVLERHMAQG 225 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 79 

A DNA sequence (GBSx0079) was identified in S.agalactiae <SEQ ID 263> which encodes the amino acid 
sequence <SEQ ID 264>. This protein is predicted to be phosphomannomutase (manB). Analysis of this 
protein sequence reveals the following: 

Possible site: 60 

>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 . 5400 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9621 > which encodes amino acid sequence <SEQ ID 9622> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 




arjct: 




rr 








Query: 




Sbjct: 




Query: 




Sb D ct: 






241 


Sbjct: 


241 


Query: 


301 


Sb j ct : 


301 




361 


Sbjct: 


361 


Query: 


421 


Sbj ct : 






481 


Sbjct: 






530 


Sbjct: 


541 



M4-+++ Y++W + L ++K LEAI GDE +++D ?YK LEFGT G+RG++G G NRM 



NTY 4- KA++ A +++ G A+G+ ++YD R++S EFA + +GIK+Y+++ 



+RPTP S+A+R LG G++ITASHNP YNG+K Y +G Q+ 



Query: 184 TDYQQIKQIPFEEALASGSASYIDESIEEAYKKEVLGLTINDTNID-- 
E +G+ I E ++ AY + + + +N ++ 



NLPVR VL GFENV W EQE+PDP F4TV PNPE AFA + GK +AD+ 



- I V+TLTGFK I K EY+ S -I 



A related DNA sequence was identified in S.pyogenes <SEQ ID 265> which encodes the amino acid 
sequence <SEQ ID 266>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>» Seems to have no N-terminal signal sequence 



- Final Results 

bacterial cytoplasm Certainty=0. 5497 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty^O . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 
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Identities = 470/564 (83%) , Positives = 517/564 (91%) 

Query: 1 MSHMNYKEIYQEWLENDSLGKDIKSDLEAIKGDESEIQDRFYKTLEFGTAGLRGKLGAGT 60 
MS+M Y E+YQEWL N+ L DIK+DL AIK +E+EIQDRFYKTLEFGTAGLRGKLGAGT 
5 Sbjct: 1 MSNMTYNEVYQEWLHNNDLSDDIKADLAAIKDNEAEIQDRFYKTLEFGTAGLRGKLGAGT 60 

Query: 61 MRMIOTWGKAAQAIJWrilDHGPEAIARGIAVSYDTOYQSKEFAELTCSIMAANGIKSY 120 

NRMNTYMVGKAAQALANTIIDHGPEA+ +GIAVSYDVRYQS+ FAELTCSIMAANGIK+Y 
Sbjct: 61 NRMlSTTYWGICAAQAIAOTIIDHGPEAVXKGIAVSYD'i/RYQSRTFAELTCSIMAMGIKAY 120 

10 

Query: 121 IYKGIRPTPMCSYAIRALGCVSGVMITASHNPQAYNGYKAYWKEGSQILDDIADQIANHM 180 

+YKGIRPTPMCSYAIRALGC+SGVMITASHNPQAYNGYKAYW+EGSQILDDIADQIA HM 
Sbjct: 121 LYKGIRPTPMCSYAIRALGCISGVMITASHNPQAYNGYKAYWQEGSQILDDIADQIAQHM 180 

15 Query: 181 DAITDYQQIKQIPFEEAIASGSASYIDESIEEAYKKEVLGLTINDTNIDKSVRWYTPLN 240 

A+T YQ+ 1 KQ+PFE+AL SG +YIDSSIEFAYKKEVLGLTINDT+IDKSWWYTPLN 
Sbjct: 181 AALTQYQEIKQMPFEKALDSGLVTYIDESIEEAYKKEVLGLTINDTDIDKSVRWYTPLN 240 

Query: 241 GVGNLPVREVLRRRGFENVYWPEQEMPDPDFTTVGYPNPEVPKAFAYSESLGKSVDADI 300 
20 GVGNLPVREVLRRRGFENVYWPEQEMPDPDFTTVGYPNPEVPK FAYSE LGK+VDADI 

Sbjct: 241 GVGNLPVREVLRRRGFENVYWPEQEMPDPDFTTVGYPNPEVPKTFAYSEKLGKAVDADI 300 

Query: 301 LLATDPDCDRVALEVKDSKGEYIFLNGNKIGALLSYYIFSQRCALGNLPHHPVLVKSIVT 360 
L+ATDPDCDRVALEVK++ G+Y+FLNGNKIGALLSYYIFSQR LGNLP +PVLVKSIVT 
25 Sbjct: 301 LIATDPDCDRVALEVKNAVGDYVFLNGNKIGALLSYYIFSQRFDLGNLPANPVLVKSIVT 360 

Query: 361 GDLSKVIADKYNIETVETLTGFICNICGKANEYDISKDKTYLFGYEESIGFCYGTFVRDKD 420 

GDLS+ IA Y IETVETLTGFKNICGKANEYD++K K YLFGYEES IGFCYGTFVRDKD 
Sbjct: 361 GDLSRAIASHYGIETVETLTGFKNICGI<ANEYDVTKQKNYLFGYEESIGFCYGTFVRDKD 420 

30 

Query: 421 AVSASI#WVEMTAYYI^ERGQTLLDVLQTIYDKFGYYNERQFSLELEGAEGQERISRIMED 480 

AVSASMM+VEM AYYK4+GQ LLDVLQTIY FGYYNERQ +LELEG EGQ+RI+RIMED 
Sbjct: 421 AVSASMMITOI^YYKKKGQNLLDVXQTIYATFGYYNERQIAIaELEGIEGQKRIARIMED 480 

35 Query: 481 FRQDPILQVGEMTLENSIDFKDGYKDFPKQNCLKYYFNEGSWYALRPSGTEPKIKCYLYT 540 

FRQ PI V EM L4- +IDF DGY+DFPKQNCLK+Y ++GSWYALRPSGTEPKIK YLYT 
Sbjct: 481 FRQTPIASVAEMALDKTIDFIDGYQDFPKQNCLKFYLDDGSWYALRPSGTEPKIKFYLYT 540 

Query: 541 IGCTEADSLSKTjNAIESACRAKMN 564 
40 IG T+ +S +KL+AIE+ACR K+N 

Sbjct: 541 IGQTQENSATKLDAIEAACRTKIN 564 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

45 Example 80 

A DNA sequence (GBSx0080) was identified in S.agalactiae <SEQ ID 267> which encodes the amino acid 
sequence <SEQ ID 268>. This protein is predicted to be methylenetetrahydrofolate dehydrogenase (folD). 
Analysis of this protein sequence reveals the following: 

Possible site: 48 

50 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4672 (Affirmative) < suco 

55 bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC44612 GB:U58210 tetrahydrof olate dehydrogenase/cyclohydrolase 
60 [Streptococcus thermophilus] 

Identities = 209/282 (74%) , Positives = 248/282 (87%) 
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1 


Sbjct: 


1 


Query: 


61 


Sbj ct : 


61 


Query: 


121 


Sbjct: 


121 




181 


Sbjct: 


181 




241 


Sbjct: 


241 



MTELIDGKALSQKMQAELGRK\nESLKEQHGIIPGLAV:LVGDNPASQVYVRNKERSALEA 60 
M ++DGKAL+ MQ +L KV RLKE+ I+PGL VI+VG+NPASOVYVRNKER+A +A 
MAI I^GKALAVIMQEQLQEKVARLKEKEWIVPGLWIWGENPASQVYVRNKERAAKKA 6 0 

GFKSETLRLSESISQEELIDIIHQYNEDKSIHGILVQIiPLPQHINDKKIILAIDPKKDVD 120 
GF S+T+ LSESIS4EELI++I 4YN44 HGILVQLPLP HIN+ +I+LAIDPKKDVD 
GFHSKTVNLSESISEEELIEVIEKYNQNPLFHGILVQLFLPNHINEMRILLAIDPKKDVD 120 

GFHPMNTGHLWSGRPMMVPCTPAGIMEMFREYHVDLEGKHAVI IGRSNIVGKPMAQLLLD 180 
GFHPMNTG+LW+GRP MVPCTPAGIME+ REY+V+LEGK AVI IGRSNIVGKPMAQLLL+ 
GFHPMNTGNLWNGRPQMVPCTPAG IME I LRE YNVELEGKTAVI IGRSNIVGKPMAQLLLE 180 

KNATVTLTHSRTRNLSEVTKE1ADILIVAIGQGHFVTKDFVKEGAVVIDVGMNRDENGKLI 240 
KNATVTLTHSRT 4-L++V +AD+LIVAIG+ FVT++FVKEGAWIDVG+NRDE GKL 
KNATVTLTHSRTPHLAKVCMKADVLIVAIGRAKFVTEEFVKEGAVVIDVGINRDEEGKLC 240 

GDWFEQVAEVASMITPVPGGVGPMTITMLLEQTYQAALRSV 282 
GDV F+QV E SMITPVPGGVGPMTITML4EQTYQAALRS4 
GDVDFDQVKEKVSMITPVPGGVGPMTI -MLMEQTYQAALRSL 282 

A related DNA sequence was identified in S.pyogenes <SEQ ID 269> which encodes the amino acid 
sequence <SEQ ID 270>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm CertaintyO. 3368 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 230/281 (81%) , Positives = 257/281 (90%) 

Query: 1 MTELIDGKALSQKMQAELGRKVERLKEQHGIIPGLAVILVGDNPASQVYVRNKERSALEA 60 

MTELIDGKAL4QKMQ EL KV LK++ GI+PGLAVILVGD+PASQVYVRNKER+AL 
Sbjct: 3 MTELIDGKALAQKMQQELAAKVNNLKQKKGIVPGIAVILVGDDPASQVYVRNKERAAliTV 62 

Query: 61 GFKSETLRLSESISQEELIDIIHQYNEDKSIHGILVQLPLPQHINDKKIILAIDPKKDVD 120 

GFKSET+RLSE I QEELI +1 4YN D +IHGILVQLPLP HINDKKI ILAIDPKKDVD 
Sbjct: 63 GFKSETVRLSEFICQEELIAVIERYNADNTIHGILVQLPLPNHINDKKIILAIDPKKDVD 122 

Query: 121 GFHPMNTGHLWSGRPMMVPCTPAGIMEMFREYHVDLEGKHAVIIGRSNIVGKPMAQLLLD 180 

GFHPMNTGHLWSGRP+MVPCTP+GIME+ REY+V+LEGKHAVI IGRSNIVGKPMAQLLLD 
Sbjct: 123 GFHPMNTGHLWSGRPLMVPCTPSGIMELLRSYNVNLEGIvHAVI IGRSNIVGKPMAQLLLD 182 

Query: 181 KNATVTLTHSRTFJ^ I SEVTKEADILIVAIGQGHFV^KDFVKEGAWIDVG^INRDENGKLI 240 

KNATVTLTHSRTR L EV + AD+LIVAIGQGHF+TK ++K+GA+VIDVGMNRD+NGKLI 
Sbjct: 183 KNATVTLTHSRTRQLEEVCRCADVLIVAIGQGHFITKQYIKDGAIVIDVGMNRDDNGKLI 242 

Query: 241 GDWFEQVAEVASMITPVPGGVGPMTITMLLEQTYQAALRS 281 

GDV F++VAEVA+ ITPVPGGVGPMTI MLLEQTYQ+ALRS 
Sbjct: 243 GDVAFDEVAEVAAKITPVPGGVGPMTIAMLLEQTYQSALRS 283 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 81 

A DNA sequence (GBSx0081) was identified in S.agalactiae <SEQ ID 271> which encodes the amino acid 
sequence <SEQ ID 272>. Analysis of this protein sequence reveals the following: 



WO 02/34771 



-148- 



PCT/GB01/04789 



Possible site: 3 9 



INTEGRAL 



Final Results 

bacterial membrane Certainty=0. 2296 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9623> which encodes amino acid sequence <SEQ ID 9624> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



MIVGEQEARMilKPRPKSSHKGDYGSVLLlGGFYPYGGAIIMAMAOTKTGAGLVTVATQ 60 
M V + R +I+PR + SHKG YG VLL+GG YPYGGAI IMAA+ACV +GAGLVTVAT 
MKVDDDLWQVIRPRLRGSHKGSYGRVLLVGGLYPYGGAI IMAAIACVNSGAGLVTVATD 6 0 



NI +LH+ LPE MAFD + + + +DVI+IG GLG E++ 



+Q++++DGSAL LL++ +L+LTPHQKEWERLSG+A+S+Q+ NTQ AL+ F 







Sbjct: 




Query: 


61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 




241 


Sbjct: 


241 



GTJLVAKS T ++Q + + VC-GPYQATGGXGDTL GM+AG LAQF 



+LHSAIA +++ AYWLPT IS IP M +LS 



No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 272 (GBS413) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 79 (lane 2; MW 34.2kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 171 (lane 7; MW 59kDa). 

GBS413-GST was purified as shown in Figure 218, lane 12. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 82 

A DNA sequence (GBSx0082) was identified in S.agalactiae <SEQ ID 273> which encodes the amino acid 
sequence <SEQ ID 274>. This protein is predicted to be Exonuclease VII large subunit (xseA). Analysis of 
this protein sequence reveals the following: 
Possible site: 36 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3172 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) - 

The protein has homology with the following sequences in the GENPEPT 

>GP:CAB14361 GB:Z99116 similar to exodeoxyribonuclease VII [large 
subunit) [Bacillus subtilis] 
Identities = 193/446 (43%) , Positives = 283/446 (63%) , Gaps = 10/446 (2%) 

Query: 4 YLSVSTLTKYLKLKFDKDPYLERVYLTGQVSNFR-RRPNHQYFSLKDDKSVIQATMWSGH 62 

Y++VS LTKY+K KFD DP+LE +++ G++SN + H YF+LK+ K +Q+ M++ 

Sbjct: 6 YVTVSaLTKYIKRKFDVDPHLENIWIKGELSNVKIHTRGHIYFTLKERKGRMQSVMFARQ 65 

Query: 63 FKKLGFELEEGMKVNVVGRVQLYEPSGSYSIIVEKAEPDGIGMAIQFEQLKKKLSQAGY 122 

++L F+ E GMKV V G + +YEPSG+Y + ++ +PDG+GAL + +E+LKKKL+ G 
Sbjct: 66 SERLPFKPENGMKVLWGGISVYEPSGNYQLYAKEMQPDGVGALYLRYEELKKKLRGEGL 125 

Query: 123 FDDRHKQLIPQFVRKIGWTSPSGAVIRDIITTVSRRFPGVEILLFPTKVQGEGAAQEIA 182 

FDDR+K+ IP F IGWTSP+GA +RD+ITT+ RR+P V+++4 P VQGE A++ I 
Sbjct: 126 FDDRYKKQIPAFPATIGWTSPTGAAVRDVITTLKRRYPLVKVIVLPALVQGENASRSIV 185 

Query: 183 QTIALANEKKDLDLLIVGRGGGSIEDLWAFNEECVVEAIFESRLPVISSVGHETDTTLAD 242 

I ANEK+ D+LIVGRGGGSIE+LWAFNEE V AIF S +P+IS+VGHETD T++D 
Sbjct: 186 TRIEEANEKEICDVLIVGRGGGSIEELWAFNEEIVARAIFASNIPIISAVGHETDFTISD 245 

Query: 243 FVADRRAATPTAAAELATPVTKIDILSWITERENRMYQSSLRLIRTKEERLQKSKQSVIF 302 

FVAD RAATPT AAE+A P T D++ E RM ++ + + ++ R+Q + S F 

Sbjct: 246 FVADIRAATPTGAAEIAVPHT-TDLIERTKTAEVRMTRAMQQHLGQEKGRIQTLQSSYAF 304 

Query: 303 RQPERLYDGFLQKLD NLNQQLTYSMRDKLQTVRQKQGLLHQKLQGIDLKQRIHIYQ 358 

R P+RLY Q+ D QLT + K + + ++ h LKQ YQ 

Sbjct: 305 RFPKRLYAQKEQQFDLAYQQFQAQLTALLDRKSRQLERETYRLEALHPHEQLKQARTRYQ 364 

Query: 359 ERWQSRRLLSSTMTSQYDSKLARFEKAQDALISLDSSRIVARGYAIIEKNHTLVSTTNG 418 

E+ Q R+ M Q ++F+ L +L +++ RGY++ K L+ + + 

Sbjct: 365 EQTNQLRK NMHIQMKQLHSQFQTVLGKLHALSPLQVMERGYSLAYKEDKLIKSVSQ 420 

Query: 419 INEGDHLQVKMQDGLLEVEVKDVRQE 444 

I E D L++K++DG+L EV + R E 
Sbjct: 421 IEEQDRLE I KLKDGVLTCE VLEKRGE 446 

A related DNA sequence was identified in S.pyogenes <SEQ ID 275> which encodes t 
sequence <SEQ ID 276>. Analysis of this protein sequence reveals the following: 
Possible site: 61 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3275 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 321/446 (71%) , Positives = 386/446 (85%) 

Query: 1 MSDYLSVSTLTKYLKLKFDKDPYLERVYLTGQVSNFKRRPNHQYFSLKDDKSVIQATMWS 60 

M+DYL+V+ LTKYLKLKFD+DPYLERVYLTGQVSNFR+RP HQYFSLKD+ +VIQATMW+ 
Sbjct: 6 MADYLTVTHLTKYLKLKFDRDPYLERVYLTGQVSNFRKRPTHQYFSLKDESAVIQATMWA 65 

Query: 61 GHFKKLGFELEEGMKVNWGRVQLYEPSG3YSIIV3KA3PDGIGALAIQFEQLKKKLSQA 120 

G 4-KKLGF+LEEGMK+NV+GRVQLYEPSGSYSI++EKAEPDGIGAIA+QFEQLKKKL+ 
Sbjct: 66 GVYKKLGFDLEEGMKINVIGRVQLYEPSGSYSIVIEKAEPDGIGALALQFEQLKKKLTAE 125 

Query: 121 GYFDDRHKQLIPQFTOKIGVVTSPSGAVIRDIITTVSRRFPGVEILLFPTKVQGEGAAQE 180 

GYF+ +HKQ +PQFV KIGV+TSPSGAVIRDIITTVSRRFPGVEILLFPTKVQG+GAAQE 
Sbjct: 126 GYFEQKHKQPLPQFVSKIGVITSPSGAVIRDIITTVSRRFPGVEILLFPTKVQGDGAAQE 185 
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Query: 


181 


Sbjct: 


186 


Query: 


241 


Sbjct: 


246 


Query: 


301 


Sbjct: 


306 




361 


Sbjct: 


366 




421 


Sbjct: 


426 



ADFVADRRAATPTAAAELATP+TK D+4-SWI ER+NR YQ+ LR 3 



IFRQPERLYD +LQK+D L4 L +M+D+L + ++ + L L L+ +1 YQ+R 



V ++RLL + M SQYDS+LARFEKAQDAL+StD+SRI+ARGYA+IEKN 



+GD L 4-KM+DG L+VEVKDV+ EN I 
KGDQLTI KMRDGQLDVEVKDVKNENI 451 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 83 

25 A DNA sequence (GBSx0083) was identified in S.agalactiae <SEQ ID 277> which encodes the amino acid 
sequence <SEQ ID 278>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

»> Seems to have no N-terminal signal sequence 

30 

Final Results 

bacterial cytoplasm Certainty=0. 2913 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG07429 GB:AE004821 exodeoxyribonuclease VII small subunit 
[Pseudomonas aeruginosa] 
Identities = 26/66 (39%), Positives = 51/66 (76%), Gaps = 2/66 (3%) 

40 

Query: 1 MSDKKT- -FEENLQELETIVSRLETGDVALEDAIAEFQKGMLISKELQRTLKEAEETLVK 58 

M+ KKT FE++L EL+T+V RLE+G+++LE+++ F++G+ +++E Q +L +AE+ + 
Sbjct: 1 MARKKTLDFEQSLTELQTLVERLESGELSLEESLGAFEQGIRLTRECQTSLSQAEQKVQI 60 

45 Query: 59 VMQADG 64 

+++ DG 
Sbjct: 61 LLERDG 66 

A related DNA sequence was identified in S.pyogenes <SEQ ID 279> which encodes the amino acid 
50 sequence <SEQ ID 280>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

>» Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytoplasm Certainty=0. 2796 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

60 An alignment of the GAS and GBS proteins is shown below: 
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Identities = 55/70 (78%) , Positives = 65/70 (92%) 

Query: 1 MSDKKTFEEl^QELETIVSRLETGDVALEDAIAEFQKGMLISKELQRTLKETiEETLVKVM 60 

MS KTFEENLQ+LETIV++LE GDV LE+AI+EFQKGML+SKELQ+TL+ AE+TLVKVM 
Sbjct: 1 MSKTKTFEEI^QDLETIVNKLENGDVPLSEAISEFQKGKLLSKELQKTLQAAEKTLVKVM 60 

Query: 61 QADGTEVEMD 70 

QADGTEV+MD 
Sbjct: 61 QADGTEVDMD 70 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 84 

A DNA sequence (GBSx0084) was identified in S.agalactiae <SEQ ID 281 > which encodes the amino acid 
sequence <SEQ ID 282>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 .2614 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA252S5 GB:AB003187 farncsyl diphosphate synthase [Micrococcus 
luteus] 

Identities - 126/258 (48%), Positives = 175/258 (66%), Gaps = 2/258 (0%) 





27 


Sbj ct : 


31 


Query: 




Sbj ct : 


90 


Query: 


147 


Sbjct: 


150 




207 


Sbj ct : 


210 




267 


Sbjct: 


270 



i +AI YS+ GGKRIRP ++L L+ G DG ALEMIHT SLIHDDLPAMD 




MV GQMLDM+GE+K L++++L IHI+KTG L+ 



+G FQ++DDILDV SFE +GKT D+ +K+TY +LLGL+ S+++L+D L + 



r L+ N N + +1 I 
CTLQ-PINDNLKTLITYI 286 

A related DNA sequence was identified in S.pyogenes <SEQ ID 283> which encodes the amino acid 
sequence <SEQ ID 284>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

»■> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3887 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below: 

Identities = 192/289 (66%), Positives = 237/289 (81%) 



Query: 


2 


MVTIEKIDEMHRYYKQTHSWSPDI.IKAILYSVDGGGKRIRPRILLEILEGFGVELIDG 


61 






M + +IDEAI RYYK T + VS +LI AILYSVD GGKRIRP ILLE++EGFGV L + 




Sbjct: 




MDKLARIDEAIRRYYKTTSNGVSEELIDAILYSVDSGGKRIRPLILLEMIEGFGVSLQNA 


60 


Query: 


62 


HYDVAAALEMIHTGSLIHDDLPAMDITODFRRGRLTKHKKFDEATAVLAGDSLFLDPFDLV 


121 






H+D+AAALEMIHTGSLIHDDLPAMDNDD+RRGRLTHHK+F EATA+LAGDSLFLDPF L+ 




Sbjct: 


61 


HFDIAAALEM1HTGSLIHDDLPAMDNDDYRRGRLTUHKQFGEATAILAGDSLFLDPFGLI 


120 


Query: 


122 


VKAGFKADVTVRLIELLSMSAGSFGMVGGQMLDMKGENKVLSIDDLSLIHINKTGRLLAY 


181 






+A ++V V LI+ LS+++G+FGMVGGQMLDMKGEN+ LS+ LSLIH+NKTG+LLA+ 




Sbjct: 


121 AQAELNSEVKVALIQELSLASGTFGW/GGQMLDMKGENQALSLPQLSLIHLMKTGKLLAF 


180 




182 


PFVAAGILAEKSEEVKGKLHQAGLLIGHAFQVRDDILDVTASFEELGKTPNKDIVAEKTT 


241 






PF AA ++ E++ V+ +L QAG+LIGHAFQ+RDDILDVTASFE+LGKTP KD+ AEK T 




Sbjct: 




PFKAAALITEQAMTVRQQLEQAGMLIGHAFQIRDDILDVTASFEDLGKTPKKDLFAEKAT 


240 




242 


YPNLLGLDKSQEILDDTLKKAQAIFQNL.EKKANFNARKIIDIIEGLRLN 290 








YP+LLGL+ S ++L ++L +A IFQ IiE F + I +IEGLRLN 




Sbjct: 


241 


YPSLLGLEASYQLLTESLDQALTI FQTLESDVGFKPQ1 ITKLIEGLRLN 289 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
25 vaccines or diagnostics. 

Example 85 

A DNA sequence (GBSx0085) was identified in S.agalactiae <SEQ ID 285> which encodes the amino acid 
sequence <SEQ ID 286>. This protein is predicted to be hemolysin-like protein (tly). Analysis of this 
protein sequence reveals the following: 

30 Possible site: 37 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.75 Transmembrane 152 - 16B ( 151 - 168) 

35 Final Results 

bacterial membrane Certainty=0 . 1298 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



40 The protein has homology with the following sequences in the GENPEPT database: 



Sbjct: 



Query: 63 VSRGGLKLEKALQVFEISVADKLTIDIGASTGGFTDWILQSGARLVYAVDVGTNQLVWKL 122 

VSRGGLKLEKA++ F++ + D++ +DIGASTGGFTD LQ+GA VYAVDVG NQL WKL 
Sbjct: 62 VSRGGLKLEKAIRAFDLHLTDRVVLDIGASTGGFTDCALQNGATFVYAVDVGYNQLAWKL 121 

Query: 123 RQDHRVRSMEQYNFRYAQKEDFKEGLPEFASIDVSFISLNLILPALKEILVDGGQWALI 182 

RQD RV ME+ NFRY + E + GLP A+IDVSFISL LILP LK +L++ WAL+ 
Sbjct: 122 RQDERWVMERTNFRYLKPEVLERGLPNMAT I DVS F I S LKL ILPVLKTMLLENSDWALV 181 



60 Query: 



243 HLQKCQDPQNLV-LDQIQDVIEKAHKEFKK 271 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 287> which encodes the amino acid 
sequence <SEQ ID 288>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.92 Transmembrane 150 - 166 ( 149 - 168) 

Final Results 

bacterial membrane Certainty=0. 2168 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



3 KERVDVIAYKQGLFETREQAKRGVMAGLVVSVINGQRYDKPGDKIDDGTELKLKGEKLKVr 62 
KERVDVL ++GL ETRE+AKR +MAGLV S -t-R DKPG K+D T L +KGE L Y 

4 ICERVDVLLVERGLMETREKAKRS IMAGLVFS - -GHERVDKPGLKVDRDTPLSVKGEVLPY 61 

63 VSRGGLKLEKGLHVFGVSVANQIGIDIGASTGGFTDVMLQDGAKLVYAVDVGTNQLVWKL 122 

VSRGGLKLEK + F + + +++ +DIGASTGGFTD LQ+GA VYAVDVG NQL WKL 
62 VSRGGLKLEKAIRAFDLHLTDRVVLDIGASTGGFTDCALQNGATFVYAVDVGYNQLAWKL 121 

Query: 123 RQDPRVRSMEQYNFRYAQPEDFNEGQPVFASIDVSFISLSLILPALHNVLSDQGQVIALI 182 

RQD RV ME+ NFRY +PE G P A+IDVSFISL LILP L +L + V+AL+ 
Sbjct: 122 RQDERVVvMERTNFRYLKPEVLEHGLPNMATIDVSFISLKLILPVLKTMLLENSDVVALV 181 

Query: 183 K^FEAGREQIGKKBI VKDKQIHEKVIQKVTOFASGYGFTVKGLDFS PI QGGHGNI EFLA 242 
KPQFEAGRE++GKKGIV+DK +H+KV+ ++ + FA G+ V GLDFSPI GG GNIEFL 

Query: 243 HLAKSQTPET-LAPHLIQKWAKAHKEFEK 271 

HL + E+ ++ +1+ V +AH E +K 
Sbjct: 242 HLMWRKDKES FI SQEMI RDTVERAHLELKK 271 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 214/275 (77%) , Positives = 238/275 (85%) 

Query: 1 MAKERvDVIiAYKQGLFDTREQAKRGVMAGKVINVINGERYDKPGEKVADDTELKLKGEKL 60 

M KERVDVLAYKQGLF+TREQAKRGVMAG+V++VING+RYDKPG+K+ D TELKLKGEKL 
Sbjct: 1 MPKERVDVLAYKQGLFETREQAKRGVMAGLWSVINGQRYDKPGDKIDDGTELKLKGEKL 60 

Query: 61 KYVSRGGLKLEKALQVFEISVADIOjTIDIGASTGGFTDVMLQSGARLVYAVDVGTNQLVW 120 

KYVSRGGLKLEK L VF +SVA+++ IDIGASTGGFTDVMLQ GA+LVYAVDVGTNQLVW 
Sbjct: 61 KYVSRGGLKLEKGLHVFGVSVANQIGIDIGASTGGFTDW1LQDGAKLVYAVDVGTNQLVW 120 

Query: 121 KLRQDHRVRSMEQYNFRYAQKEDFKEGLPEFASIDVSFISLNLILPALKEILVDGGQWA 180 

KLRQD RVRSMEQYNFRYAQ EDF EG P FASIDVSFISL+LILPAL +L D GQV+A 
Sbjct: 121 KLRQDPRVRSMEQYNFRYAQPEDFNEGQPVFASIDVSFISLSLILPALHNVLSDQGQVIA 180 

Query: 181 LIKPQFEAGREQIGKNGIVKDKLVHEKVLTTVTNFTKDYGYTVKHLDFSPIQGGHGNIEF 240 

LIKPQFEAGREQIGK GIVKDK +HEKV+ V +F YG+TVK LDFSPIQGGHGNIEF 
Sbjct: 181 LIKPQFEAGREQIGKKGIVKDKQIHEWIQmiDFASGYGFTVKGLDFSPIQGGHGNIEF 240 



Query: 241 LMHLQKCQDPQNLVLDQIQDVIEKAHKEFKKMEEE 275 

■ L HL K Q P+ L IQ V+ KAHKEF+K+E+E 
Sbjct: 241 LAHLAKSQTPETLAPHLIQKvVAKAHKEFEKHEKE 275 
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SEQ ID 286 (GBS310) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 57 (lane 3; MW 34kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 61 (lane 4; MW 58.8kDa). 

The GBS310-GST fusion product was purified (Figure 210, lane 10) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 282), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 86 

A DNA sequence (GBSx0086) was identified in S.agalactiae <SEQ ID 289> which encodes the amino acid 
sequence <SEQ ID 290>. Analysis of this protein sequence reveals the following: 
Possible site: 18 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 1966 (Affirmative) < succ> 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA09426 GB:AJ010954 arginine repressor [Bacillus 
s tearothermophi lus ] 
Identities = 49/153 (32%) , Positives = 84/153 (54%) , Gaps = 4/153 (2%) 

Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sb j ct : 



121 H1FSLTADDNSLLLIAKSEADADHIRQSMIAML 153 

I D++ L+I ++ DA+ + ++ ML 

117 E1VGTI CGDDTCLI ICRTAEDAEKVSGQLLGML 149 



A related DNA sequence was identified in S.pyogenes <SEQ ID 29 1> which encodes the amino acid 
sequence <SEQ ID 292>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1717 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 87/154 (56%) , Positives = 118/154 (76%) , Gaps = 1/154 (0%) 

Query: 1 MKKSERLNLIKQIVLNHAVETQHELLRRLKAYGVTLTQATISRDMNEIGIIKVPSAKGRY 60 

MKKSERL LIK++VL H +ETQH+LLR L +G+ LTQATISRDMNEIGI+K+PS GRY 
Sbjct: 12 MKKSEPiELIKKMVIiTHPIETQHDLLRLLAEHGLELTQATISRDMNEIGIVKIPSGSGRY 71 
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Query: 61 IYGLSNENDPIFTTAVAKPIKTSILSISDKLLGLEQFININVIPGNSQLIKTFIMSHCQE 120 

IYGLS ++ + IK4+IL++SDK GLEQ + + V+PGNS+LIK ++++ + 

Sbjct: 72 IYGLSQDSGKKIVQG-PRSIKSTIIAVSDXTKGLEQHLYLKVVPGNSKLIKRYLLADFSK 130 

5 

Query: 121 HIFSLTADDNSLLLIAKSEADADHIRQSMIflMLE 154 

IFSL ADD+SLLLIAKS ++AD IRQ ++ ++ 
Sbjct: 131 AIFSLIADDDSLLLIAKSPSEADMIRQEILLWMQ 164 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 87 

A DNA sequence (GBSx0088) was identified in S.agalactiae <SEQ ID 293> which encodes the amino acid 
sequence <SEQ ID 294>. Analysis of this protein sequence reveals the following: 

15 Possible site: 15 

»> Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0. 3339 (Affirmative) < suco 

bacterial membrane Certainty=0 . 00 00 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 88 

A DNA sequence (GBSx0089) was identified in S.agalactiae <SEQ ID 295> which encodes the amino acid 
30 sequence <SEQ ID 296>. This protein is predicted to be DNA repair protein recn (recN). Analysis of this 
protein sequence reveals the following: 

Possible site: 50 

»> Seems to have no N-terminal signal sequence 

35 

Final Results 

bacterial cytoplasm Certaxnty=0. 1651 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

40 

The protein has homology with the following sequences in the GENPEPT database: 

iGP:CAB14355 GB:Z99116 recN [Bacillus subtilis] 
Identities = 244/567 (43%) , Positives = 366/567 (64%) , Gaps = 18/557 (3%) 

45 Query: 1 MLLEISIKNFAIIEEISLNFETGMTVLTGETGAGKBIIIDAMNMMLGSRASVEVIRHGAN 60 

ML E+SIKNFAIIEE++++FE G+TVLTGETGAGKSI I IDA+++++G R S E +R+G 
Sbjct: 1 MLAELS IKNFAI IEELTVS FERGLTVLTGETGAGKS III DAI SLLVGGRGSSEFVRYGEA 60 

Query: 61 KAEIEGFFSVEKNQSLVQLLEENGIELADELII-RREIFQNGRSVSRINGQMVNLSTIiKA 119 
50 KAE+EG F +E ++ + E GI+++DE+I+ RR+I +G+SV R+NG++V +++L+ 

Sbjct: 61 KAELEGLFLI^SGHPVLGVCAEQGIDVSDEMIvMRRDISTSGKSVCRVNGKLVTIASLRE 120 

Query: 120 VGHYLVDIYGQHDQEELMKPNMHILMLDEFGNTEFNVIKERYQSLFDAYRQLRKRVLDKQ 179 
+G L+DI+GQHD + LM+ H+ +LD+F E + YQ + Y +L K++ 
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Sbjct: 121 IGRLLLDIHGQHDNQLLMEDENKLQLLDKFAGAEV^SALXTYQEGYQRYVKLLKKL.KQLS 180 

3 KJffiQENKSRIEMLEFQIAEIESVALKSDEDQTLI,KQI^KL^KNIiyDTLTOA.YLMLDNE 239 
++EQE +++++FQ+ EIES L+ +ED+ h ++R ++ N 4 



L V A L+ + + K+S ++S +YY++E+ T +■(- +++D+D+FD 



Sbj ct: 




Query: 




Sbjct: 




Query: 


240 


Sbjct: 


241 


Query: 




Sb D ct: 




Query: 


360 


Sbjct: 


360 


Query: 


404 


Sbjct: 


420 


Query: 


464 


Sbjct: 


480 




524 


Sbjct: 


540 



H RKYG V D+L+Y I +E + + +IH-KEL + D+ 



AK+L +EI +EL IjYMEK+ F +F 



VSGRVAQAIA+KIHK+ QVL I+HL QV A+AD +1 K 



A related DNA sequence was identified in S.pyogenes <SEQ ID 297> which encodes the amino acid 
sequence <SEQ ID 298>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1215 (Affirmative) <. suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 403/550 (73%) , Positives = 472/550 (85%) 

Query: 1 MLLEISIKNFAIIEEISIjNFETGMl'VLTGETGAGKSI I IDAMNMMLGSRASVEVIRHGAN 60 

MLLEISIKNFAII+EIStNFE GMTVLTGETGAGKSIIIDAMNMMLG+RAS EVIR GAN 
Sbjct: 2 MLLEI S 1 KNFAI IDEI SLNFENGMTVLTGETGAGKS III DAMNMI4LGARASTEVIRRGAN 61 

Query: 61 KAEIEGFFSvEI<NQSI.VQLLEENGIELADEL-IRREIFQNGRSVSRINGQ^WNLSTLKAV 120 

KAEIEGFFSV+ LV LE +GI + +ELIIRR+IF NGRSVSRINGQMVNL+TLK V 
Sbjct: 62 KAEIEGFFSVDATPELVACLESSGIAMEEELIIRRDIFANGRSVSRINGQMVNIATLKQV 121 

Query: 121 GHYLVDIYGQHDQEELMKPN^ILMLDEFGNTEFNVIKERYQSLFDAYRQLRKRVLDKQK 180 

G +LVDI +GQHDQEELM+P +H +LD FG+ F +KE YQ +FD Y+ LR++V+DKQK 
Sbjct: 122 GQFLVDIHGQHDQEELMRPQLHQQILDAFGDICAFEQLKENYQLIFDRYKSLRRQVIDKQK 181 

Query: 181 ^QENECSRIEMLEFQIAEIESvALKSDEDQTLLKQRDKLMJSraiOIIADTLTNAYLMLDNEE 240 

NE+E+K RI+ML FQIAEIE+ AL ED h ++RD+LMNHK IADTLTNAY+MLDN+4- 
Sbjct: 182 NEKEHKDRIDMIAFQIAEIEAAALSRGEDDRIjNQERDRLMNHKQIADTLTNAYVMLDNDD 241 

Query: 241 FSSLSNTOSAMNDLMALEEFDREYI03LSTNLSEAYY\ r IEEWI<RLGDVIDDLDFDAGLLQ 300 
FSSLSN+RS+MNDL+++E+FD EYK +ST++SEAYY++EEV+K+L D ID LDFD G LQ 



Query: 301 EIENRLDVINTITRKYGGDVNDVLDYFDNITKEYSLLTGSEESSaALSICELKILEHDLIE 360 
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EIE RLD++N++TRKYGG+VNDVLDY+DNI KEY LLTG + SS LE ELK LE L+ 
Sbjct: 302 EIEFRLDILNSLTRKYGGNVNDVLDYYDKIVKEYQLLTGDDLSSGDLEAELKSLEKQLVA 361 

Query: 361 SMQLSLERHKIJUCQLENEIKQELTELYMEKADFQVQFTKGKFNKEGNEIVEFYISTNPG 420 

+A++LS+ RH+LA+QLE EIK EL ELYMEKADF+V FT KFN++GNE +EFYISTNFG 
Sbjct: 362 AaSELSVSRHQIJffiQLFJffilKMLKELYMEKRDFKVHFTTSKFNRDGNESLEFYISTOPG 421 

Query: 421 EGFKPLVKVASGGELSRLMLAIKSAFSRKEDKTSIVFDEVDTGVSGRVAQAIAQKIHKIG 480 

EGFKPLVKVASGGELSRLMLAIK+A SRKEDKTSIVFDEVDTGVSGRVAQAIAQKI+KIG 
Sbjct: 422 EGFKPLVKVASGGELSRLMLAIKAAISRKEDKTSIVFDEVDTGVSGRVAQAIAQKIYKIG 481 

Query: 481 SHGQVLAISHLAQVIAIADYQYFIEKISSDSSWSTVRLLSYEERVEEIAKMIAGNNVTD 540 

HGQVLAISHL QVIAIADYQYFI K S + STVS VRLL+ EERVEEIA M+AG ++T 
Sbjct: 482 RHGQVLAISHLPQVIAIADYQYFISKESKEESTVSKVRLLTPEERVEEIASMIAGTDMTQ 541 

Query: 541 TARTQAKELL 550 

A TQA+ELL 
Sbjct: 542 AALTQARELL 551 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 89 

A DNA sequence (GBSx0090) was identified in S.agalactiae <SEQ ID 299> which encodes the amino acid 
sequence <SEQ ID 300>. This protein is predicted to be degV protein. Analysis of this protein sequence 
reveals the following: 

Possible site: 38 



Final Results 

bacterial membrane Certainty=0 . 1383 (Affirmative) < succ 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB07346 GB:AP001519 unknown conserved protein [Bacillus halodurans] 
Identities = 93/277 (33%), Positives = 152/277 (54%), Gaps = 4/277 (1%) 



Query: 


1 


Sbjct: 


1 




61 


Sbjct: 






120 


Sbjct: 


121 




180 


Sbjct: 


181 




239 


Sbjct: 


241 



M+KI IVTDS+ + P+ KEL + WPLSV+ 



TI+ ++EV++++ 



+S+ G +1 TH GE 4 
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A related DNA sequence was identified in S. pyogenes <SEQ ID 30 1> which encodes the amino acid 
sequence <SEQ ID 302>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.54 Transmembrane 180 - 19S ( 180 - 196) 
INTEGRAL Likelihood = -0.16 Transmembrane 21 - 37 ( 21 - 38) 

Final Results 

bacterial membrane Certainty=0 . 1617 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 197/279 (70%) , Positives = 226/279 (80%) , Gaps = 1/279 (0%) 

MSKIKIVTDSSITIEPELIKELDITWPLSVMIDGTLYSDNDLKAQGEFLNLMRGSKELP 6 0 
M IKIVTDSSITIEPELIK LDITWPLSVMID LYSDNDLK +G FL+LM+ SK LP 
MGTIKIVTDSSITIEPELIKRLDITWPLSVMIDSKLYSDNDLKEEGHFLSLMKASKSLP 64 

KTSQPPVGVFAEIYEKLMNEGVEHIIAIHLTHTLSGTIEASRCGANIAGADVTVIDSTFT 120 
KTSQPPVG+ FAE YE- L+ +GV I+AIHL-i- LSGTIEASRQGA IA A VTV+DS FT 
KTSQPPVGLFAETYENLVKXGVTD1VAIHLSPALSGTIEASR0GAEIAEAPVTVLDSGFT 124 

DQCQKFQVVEAAKLAKEGADLDTILARVEEVRQKSELFIGVSTLENLVKGGRIGRVTGLL 180 
DQ KFQWEAAK+AK GA L+ ILA V+ ++ K+EL+ IGVSTLENLVKGGRIGRVTG+L 



SSLLN4KV+M L N EL *VKGRG KTF+KWLD+++ R IAEI ISY G+A +A 

SSLI.NVKVVI'IALKITOELKTLVKGRGNKTFTraLDSYIAKNSHRPIAEIAISYAGEASL^ 244 

NFREKLAV-LGAPISVLETGSIIQTHTGEEIAFAVMVRYB 278 

+E++A ISVLETGSI IQTHTGE AFAVMVRYE 

TLKERIAAYYNHSISVLETGSIIQTHTGEGAFAVMVRYE 283 

SEQ ID 300 (GBS113) was expressed in E.coli as a His-fusion product. Purified protein is shown in Figure 
201, lane 8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



Query: 




Sbjct: 


5 


Query: 




Sb j Ct : 




Query: 


121 


Sbjct: 


125 




181 


Sbjct: 


185 


Query: 


241 


Sbjct: 


245 



Example 90 

A DNA sequence (GBSx0092) was identified in S.agalactiae <SEQ ID 307> which encodes the amino acid 
sequence <SEQ ID 308>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

»> Seems to have a cleavable N-term signal seq. 



Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < succ: 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 13 WKWAFLLLLAINLSFTAVIASRLIQVREFl'ITGKISTGVQDICVKVGTFTTNKSQLNKTIAL 72 
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Query: 73 YLKQYQTKKMNYKIYAASSSILFEGSYCLLGYEVPLYIYFEPYRLTNGAVQLKVTSFSVG 132 

+ + Y T KM +K+ +S I+FE SY++LG+ VPLY+YF P +GAV L+ + S G 
Sbjct: 62 FAQDYSTDKMRFKVKVTNSKIVFESSYKVLGHAVPLYVYFTPLVSESGAVVLQESELSAG 121 

Query: 133 TLPLPEKDVLQYIKSSYKLPNFVDIKPKKSV1NINLQDLKNKEGIYLKATAIDLVNDNFS 192 

TL LP D L IK S KLP+++ I KK + +N+Q +KN +GI +A + DLVND 
Sbjct: 122 TLKLPILDALNMIKRSTKLPDYIVIDSKKGCTItNIQSMKNDKGITAEAQSFDLVNDRSE 181 

Query: 193 FDIFK 197 

FDI+K 
Sbjct: 182 FDIYK 18S 

A related DNA sequence was identified in S.pyogenes <SEQ ID 309> which encodes the amino acid 

sequence <SEQ ID 310>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside --- Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane — Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA72097 GB:Y11213 hypothetical protein [Streptococcus thermophilus] 
Identities = 73/185 (39%) , Positives = 112/185 (60%) , Gaps = 3/185 (1%) 



Sbjct 
sbj. 
Sbji 
sbj< 



10 WKWS FLCLLAFNTAFLMVI ASRLI QVRE PE SEL I AKKPVKNI KI GTFVTTREQLNETVAS 69 

WKW FL LLA N A + V+ R++ E + K K IG + ++E+L+E++ 
5 WKVOjFLGLLAIJSnjALISVVTVRIMTPVETSPVSLPKGATK IGKYSMSKEEIoDESLRG 61 

70 YLKDYQTEKMSYKFYATSSSILFEGTYQLLGYEVPLYIYFQPHRLENGAVQLQVISFSVG 129 

+ +DY T+KM +K T+S I+FE +Y++LG+ VPLY+YF P E+GAV LQ S G 
62 FAQDYSTDKMRFKVKVTNSKIVFESSYKVLGHAVPLYVYFTPLVSESGAVVLQESELSAG 121 

130 TLPLPEKDVLQYLKSSYKLPSFVKVMPNQSAIVVNLQDIQNDAKVYLKAKKIDLFNDEIS 189 

TL LP D L +K S KLP ++ + + +++N+Q ++ND + +A+ DL> ND 
122 TLKLPILDALNMIKRSTKLPDYIVIDSKKGKVILNIQSMKNDKGITARAQSFDLVNDRSE 181 

190 FNIYK 194 

F+IYK 
182 FDIYK 186 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 129/194 (66%) , Positives = 155/194 (79%) 

Query: 5 KTGRNMFWCTIAFLLLIAINLSFTAVIASRLIQvREPOTGKISTGVQDKVKVGTFTTNKS 64 

K NLN+WKW+FL LLA N +F VIASRLIQVREP + 1+ +K+GTF T + 

Sbjct: 2 KKKSNLNWWKWSFLCLLAFNTAFLIWIASRLIQWEPESELIAKKPVKNIKIGTFVTTRE 61 

Query: 65 QI^KTIALYLKQYQTKKMNYKIYAASSSILFEGSYQLLGYEVPLYIYFEPYRLTNGAVQL 124 

QLN+T+A YLK YQT+KM+YK YA SSSILFEG+YQLLGYEVPLYIYF+P+RL NGAVQL 
Sbjct: 62 QLNETVASYLKDYQTEKMSYKFYATSSSILFEGTYQLLGYEVPLYIYFQPHRLENGAVQL 121 

Query: 125 KVTSFSVGTLPLPEKI1VLQYIKSSYKLPNFVDIKPKKSVININLQDLKNKEGIYLKATAI 184 

+V SFSVGTLPLPEKDVLQY+KSSYKLP+FV + P +S I +NLQD++N +YLKA I 
Sbjct: 122 QVISFSVGTLPLPEKDVLQYLKSSYKLPSFVKVMPNQSAIVVNLQDIQNDAKOTLKAKKI 181 

Query: 185 DLVNDNFSFDIFKK 198 

DL ND SF+I+KK 
Sbjct: 182 DLFNDEISFNIYKK 195 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8487> and protein <SEQ ID 8488> were also identified. Analysis of this 
5 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: 7.47 
GvH: Signal Score (-7.5): 2.42 
Possible site: 28 
10 »> Seems to have a cleavable N-term signal seq. 

ALOM program count: 0 value: 5.39 threshold: 0.0 
PERIPHERAL Likelihood = 5.89 120 
modified ALOM score: -1.S8 

15 *** Reasoning Step: 3 

Final Results 

bacterial outside --- Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

SEQ ID 308 (GBS20) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 4 (lane 5; MW 25kDa) and in Figure 167 (lane 12-14; MW 37kDa - Ihioredoxin 
fusion). It was also expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
25 shown in Figure 9 (lane 7; MW 47.6kDa). Purified Thio-GBS20-His is shown in Figure 244, lane 12. 

Example 91 

A DNA sequence (GBSx0093) was identified in S.agalactiae <SEQ ID 31 1> which encodes the amino acid 
sequence <SEQ ID 312>. This protein is predicted to be histone-like DNA-binding protein. Analysis of this 
protein sequence reveals the following: 

30 Possible site: 40 

»> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm --- Certainty=0. 27 6 8 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9313> which encodes amino acid sequence <SEQ ID 9314> 
40 was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD40810 GB:L40355 histone-like DNA-binding protein [Streptococcus mutans] 
Identities = 43/47 (91%), Positives = 46/47 (97%) 

45 Query: 1 MANKQDLIAKVAEATELTKIQDSAAAVnAVFAAVADYIAEGEKVQLIG 47 

MANKQDLIAKVAEATELTKKDSAAAVDAVF+AV+ YLA+GEKVQLIG 
Sbjct: 1 MANKQDLIAKVAEATELTKKDSAAAVDAVFSA.VSSYIjAKGEKVQLIG 47 

A related DNA sequence was identified in S.pyogenes <SEQ ID 313> which encodes the amino acid 
50 sequence <SEQ ID 314>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
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»> Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0 .2834 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 41/47 (87%) , Positives = 44/47 (93%) 

Query: 1 MANKQDLIAKVAEATELTKKDSAAAVDAVFARVADYLAEGEKVQLIG 47 

MANKQDLIAKVAEATELTKKDSAAAVDAVF+ + +LAEGEKVQLIG 
Sbjct: 1 MANKQDLIAKVAEATELTKKDSA5AVDAVFSTIEAFLAEGEKVQLIG 47 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 92 

A DNA sequence (GBSx0094) was identified in S.agalactiae <SEQ ID 315> which encodes the amino acid 
sequence <SEQ ID 316>. Analysis of this protein sequence reveals the following: 
Possible site: 54 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2722 (Affirmative) < suco 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside --- Certainty-0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9293> which encodes amino acid sequence <SEQ ID 9294> 
was also identified. A further related GBS nucleic acid sequence <SEQ ID 10793> which encodes amino 
acid sequence <SEQ ID 10794> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD17886 GB:AF100456 hyaluronate-associated protein precursor 
[Streptococcus equi] 
Identities = 303/435 (69%) , Positives = 360/435 (82%) , Gaps = 1/435 (0%) 



Query: 


1 


Sbjct: 


87 




61 


Sbjct: 


147 






Sbjct: 


206 




181 


Sbjct: 


266 




241 


Sbjct: 


326 


Query: 


301 



4-A KVDVS +DGLTYTATLR GLKWSDGS LTA+DFVYSWQR+VDPKTAS+YAYLA E H+ 



YK+G+LD ANIS TSAIY ANK +KDV VLEATTAY+ YN TG+++GL+++KIR+ALNL 



AT+RKG+V AAVDTGSKPA A PTGLAK DGTDL ++VAPGY+Y+ EAAKLFKEGLA 



EG h +TITADADAPAAK++VDYI K TWE ALPGLTVEEKFV FKQRLED++ QNF+ 
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Sbjct: 386 ELGKDALTITITADADAPAAKSAVDYIKETWETALPGLTVEEKFVPFKQRLEDTKNQNFE 445 

Query: 361 IWSLWGGDYPEGSTFYGLFKSDSQNNDGKFANKDXDAAYNKAISEDAMKPAESAKDYKE 420 
+ V LWGGDYP+GSTFYGLFRS S N GKF N DYDAAYNKA+4- DA+ +A DYK 
5 Sbjct: 446 VAVVLWGGDYPKGSTFYGLFKSGSAYNYGKFTNADYDAAYNKALTTDALNTDAAADDYKA 505 

Query: 421 AEKTLFEQGAYNPLY 435 

AEK L++ YNPLY 
Sbjct: 506 AEKALYDNALYNPLY 520 

10 

A related GBS gene <SEQ ID 8489> and protein <SEQ ID 8490> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: 21 Crend: 4 
Sequence Pattern: CGSK 
15 SRCFLG : 0 

McG: Length of UR: 19 

Peak Value of UR: 2.34 
Net Charge of CR: 3 
McG: Discrim Score: 5.94 
20 GvH: Signal Score (-7.5): 0.6 

Possible site: 20 
>>> May be a lipoprotein 
Amino Acid Composition: calculated from 22 
ALOM program count: 0 value: 5.14 threshold: 0.0 
25 PERIPHERAL Likelihood = 5.14 166 

modified ALOM score: -1.S3 

*** Reasoning Step: 3 

30 Final Results 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the databases: 

>GP|433667l|gb|AAD17886.l| |AF100456 hyaluror.ate-associated protein 
precursor {Streptococcus equi} 

Score = 721 bits (1840), Expect = 0.0 
40 Identities = 354/515 (68%) , Positives = 417/515 (80%) , Gaps = 2/515 (0%) 

KNWRRVGVGVLTLASVATLAACGSK-SASQD3KGAINWAIPTEINTLDLSKVTDTYSNLA 59 
K +R+G+ +TLASVA L ACG+K SAS D INW PTEI TLD+SK TDTYS LA 
KACKRLGIjAAVTLASVAALMACGNKQSASTDKKSEINWYTPTEIITLDISKNTDTYSALA 66 



R+VDPKTAS +YAYLA E H+ NA+ IN G+ DL+ LGVKA+G+ KV+ TL+ P+PQF 



Query: 


1 


Sbjct: 


7 


Query: 


60 


Sbjct: 


67 




120 


Sbjct: 


127 




180 


Sbjct: 


186 


Query: 


240 




246 


Query: 


300 


Sbjct: 


306 



L+F+NF+PQK+ V+ GKDY TTS+ +YSGPY V+ WNG++GTF L KNKNYWDAKN 



VKT+ V +QTVKKPDTAVQMYK4G+LD ANIS TSAIY ANK +KDV VLEATTAY+ Y 



j TG+++GL+++KIR+ALNLAT+RKG+V AAVDTGSKPA A PTGLAK DGTDL ++V 
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Query: 360 APGYEYNKTEAAKLFKEGLASSGLTICLKLTITADADAPAAiQJSVDYIKSTWEfiALPGLTV 419 

APGY+Y+ EAAKLFKEGLAE G L +TITADADAPAAK++VDYIK TWE ALPGLTV 
Sbjct: 366 APGYKYDDKEAAKLFKEGLAELGKDALTITITAimD^ARKSAVDYIKETVffiTALPGLTV 425 

EEKFV FKQRLED++ QNF++ V LWGGDYP-GSTFYGLFKS S N GKF N DYDAAY 
Sbjct: 426 EEKFVPFKQRLEDTKNQNFEVAWLWGGDYPKGSTFYGLFKSGSAYHYGKFTWADYDMY 485 

Query: 480 NKAISEDAMKPAESAKDYKEAEKILFEQGAYNPLY 514 

NKA+4- DA+ +A DYK AEK L++ YNPLY 
Sbjct: 486 NKALTTDAIiNTDAAADDYKAAEKALYDNALYNPLY 520 

A related DNA sequence was identified in S. pyogenes <SEQ ID 317> which encodes the amino acid 
sequence <SEQ ID 318>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

»> May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 .0000 {Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 114/428 (26%) , Positives - 185/428 (42%) , Gaps = 63/428 (14%) 

Query: 7 VSKDGLTYTATLRKGLKW- - SDGSK LTAKDFVYSWQRLVDPKTASQYAYLAVEGHVL 61 

VSKDGLTYT TLR G+ W +DG + +TA+DFV + VD K+ + Y VE + 
Sbjct: 92 VSKDGLTYTYTLRDGVSWYTADGEEYAPWAEDFVTGLICHAVDDKSDALY WEDSIK 148 

Query: 62 NADKINEGQEKDLNKLGVKAEGDDKWITLSSPSPQFIYYIiAFTNFMPQKQEWEKYGKD 121 

N G E D ++GVKA D V TL+ P + ++ P + ++ GKD 

Sbjct: 149 NLKAYQNG-EVDFKEVGVKALDDKWQYT™KPESYWNSKTTYSVLFPVNAKFLKSKGKD 207 

Query: 122 YATTS KNTV- YSGPYTVEGWNGSNGTFTLKKNJ<NYWDAKNVKTKEVRI - - QTVKKPDTAV 178 

+ TT +++ +G Y + + S + KN+NYWDAKNV + V++ P + 

Sbjct: 208 FGTTDPSSILWGAYFLSAFT-SKSSMEFHKNENYWDAKNVGIESVKLTYSDGSDPGSFY 266 

Query: 179 QMYKRGELDAANISNTSAIYQANKNN--KDVT-DVLEATTAYMEYNTT 223 

+ + +GE A + Y++ K N ++T +L ++ +N 

Sbjct: 267 KNFDKGEFSVARLYPNDPTYKSAKKOTADKITYGMLTGDIRHLTWNLNRTSFKOTKKDPA 326 

Query: 224 ---GSVKGLDNVKIPJIALNLATNRKGWQAATOTGSKPA IAFAPT- -GLAKTPDGT 274 

K L+N R+A+ A +R +K + PT + ++ G+ 

Sbjct: 327 QQDAGKKALNNKDFRQAIQFAFDRASFQAQTAGQDAKTKALRNMLVPPTFVTIGESDFGS 386 

Query: 275 DLAKYVAP - G YE - YNKTEAAKLF KEGLAESGLT-KLKLTITADAD 316 

++ K +A G E YN +A F KE L G+T ++L D 

Sbjct: 387 EVEKE^KLGDEWIOJVNLADAQDGFYNPEKAKAEFAKAKEALTAEGVTFPVQLDYPVDQA 446 

Query: 317 APAAKNSVDYIKSTWEAALPGLTV EEKFVTFKQR LEDSRKQNFD I WS LWGG 368 

A K + EA+L V E + T + + E +Q++DI+ S WG 

Sbjct: 447 NAATVQFAQSFKQSViMLGKENVIVKVLETETSTHEAQGFYAETPEQQDYDIISSVWGP 506 

Query: 369 DYPEGSTF 376 

DY + T+ 
Sbjct: 507 DYQDPRTY 514 

SEQ ID 9294 (GBS663) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 137 (lane 3; MW 89.5kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 137 (lane 5-7; MW 64.5kDa), in Figure 
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179 (lane 11; MW 65kDa) and in Figure 65 (lane 2; MW 61kDa). Purified GBS663-His is shown in Figure 
231, lane 3-4. Purified GBS324-His is shown in lane 6 of Figure 210. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d' 



Example 93 

A DNA sequence (GBSx0095) was identified in S.agalactiae <SEQ ID 319> which encodes the amino acid 
sequence <SEQ ID 320>. This protein is predicted to be transmembrane protein OppB (oppB). Analysis of 
this protein sequence reveals the following: 

Possible site: 37 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = -9 
Likelihood = -6 
Likelihood = -4 
Likelihood = -3 



Transmembrane 



Transmembrane 



- 254 



281 - 313: 



105 - 132! 
140 - 166) 



237 - 255! 



• Final Results 

bacterial membrane --- Certainty=0. 5310 (Affirmative) ■ 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < s 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < I 



A related GBS nucleic acid sequence <SEQ ID 8491> which encodes amino acid sequence <SEQ ID 8492> 
was also identified. 

25 The protein has homology with the following sequences in the GENPEPT database: 



Query: 13 MIKYILKRVAILLVTLWWITLSFFLMQILPGTPYNNP-KLTEEMIALLNKQYGLDKPVW 71 

M+KY LKRV +L+TL+++ +++F LM+ LPGTPY N KL++E I + N++YGL+ + 

Sbjct: 1 IWKYTLKRVLYMLITLFIIASVTFVLMKFLPGTPYRNQEKLSDEQIHMTNEKYGLNDSIP 60 

Query: 72 QQYLTYLWNVLHGDFGTSYQSVNQPVSRMISLRLGVSVHLGVQALVFGVLGGILVGAISA 131 

QY Y+ ++ GD G S+Q N+PVS ++S +G SV L ++A+ FGV+ GIL+G I+A 

Sbjct: 61 VQYFNYMTGLVKGDLGVSFQLDNRPVSEILSALIGPSVQLALEAMAFGVIFGILLGVIAA 120 



Sbjct: 



132 RHKNDKVDGILSVIATLGISMPSFIIGILLLDYFGFKWNLIiPLSGWGTFSQTILPSLALG 191 

++N D + IA LG S+PSF+ +L + G K + P++GWGTF+ TILP+ AL 
121 ^QNRWPDYTSTFIAILGKSVPSFVFATA^QYKLGAKLQIFPVAGWGTFADTILPAFALA 180 



Query: 192 LPTLRSVSRFFRSEMIETLNSDYVQLaRSKGMTIRQVTRKHAYRKSMIPILTLIGPLAAG 251 

+ LA+ +RF R+E+I+ SDYV LA++KG + +V KHA RN++IP++T++GPL+ 
Sbjct: 181 MFPIATAARFMRTELIDVFASDYVLIAKAKGNSRTEVAVKHAIRNALIPLITVLGPLSVA 240 



Query 
Sbjct 



252 LLTGSALIEQIFSIPGIGQQFVTSIPTKDYPVIMGTTIVYAVMLMVAILITDWISIVDP 311 

L+TGS +IE I+SIPGIG QFV+SI T DYPVIMGTTI++AVML4 IL+ D++ ++DP 
241 LMTGSLVIENIYSIPGIGSQFVSSIQTNDYPVIMGTTILFAVMLVFVILVVDILYGLIDP 300 



Query: 312 RVRL 315 
R+R+ 

Sbjct: 3 01 RIRV 3 04 



There is also homology to SEQ ID 64. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 9069> which encodes a 
<SEQ ID 9070>. Analysis of this protein sequence reveals the following: 



d acid sequence 
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Possible site: 25 





have an uncleavable N- 


term signal seq 










INTEGRAL 


Likelihood = 


-8.81 


Transmembrane 


466 


482 


463 


493 


INTEGRAL 


Likelihood = 


-5.10 


Transmembrane 


419 


435 


418 


44 0 


INTEGRAL 


Likelihood = 


-4.78 


Transmembrane 


328 




322 


348 


INTEGRAL 


Likelihood = 


-4.41 


Transmembrane 


365 


382 


365 


384 


INTEGRAL 


Likelihood = 


-4.09 


Transmembrane 


290 


306 


287 


311 


INTEGRAL 


Likelihood = 


-2.97 


Transmembrane 


17 


33 


13 


36 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 4524 (Affirmative) ■ 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty=0. 0000 (Not Clear) < : 



15 An alignment of the GAS and GBS sequences follows: 



Sbjct: 



1 IGFFGVMFSYIVGLPLGLFMARFKNTYFDSFSTATMTFMLALPSIAV-IYWRFLGGMVG 349 
+G ++F + G+ +G AR KN D + T +++PS + I ++ + G 
LGVQALVFGVLGGILVGAISARHKNDKVDGILSVIATLGISMPSFIIGILLLDyFGFKWN 158 



Query: 350 LPDSFPMLGASDPKSYILPALILGIIiNIPTTVIWFRRYLVDLQASDWVRFARSKGLSESE 409 

L P+ G ILP+L LG+ + + +FR +++ SD+V+ ARSKG++ + 

Sbjct: 159 L---LPLSGWGTFSQTILPSLALGLPTLASVSRFFRSEMIETLNSDYVQLARSKGMTIRQ 215 

Query: 410 IYRGHLFKMAMVPIVSGVPASIILAIGGATLTETVFAFPGMGKMLIDSIKSANNSMIVGL 469 

+ R H ++N+M+PI++ + + G+ L E +F+ PG+G+ + SI + + +I+G 

Sbjct: 216 VTRKHAYRNSMIPIIiTLIGPLAAGLLTGSALIEQIFSIPGIGQQFVTSIPTKDYPVIMGT 275 



Query: 
Sbjct: 



470 TFIFTVLSIVSLLLGDIVMTLVDPRIKL 497 

T ++ V+ +V++L+ D+V+++VDPR++L 
276 TIVYAVMLMVAILITDWISIVDPRVRL 303 



35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or di 



Example 94 

A DNA sequence (GBSx0096) was identified in S.agalactiae <SEQ ID 321> which encodes the amino acid 
sequence <SEQ ID 322>. This protein is predicted to be transmembrane protein OppC (oppC). Analysis of 
40 this protein sequence reveals the following: 

Possible site: 59 



> Seems to have no N- terminal 
Likelihood =• 
Likelihood = 
Likelihood = 
INTEGRAL Likelihood = 
Likelihood = 



INTEGRAL 



.gnal sequence 

Transmembrane 311 - 327 
Transmembrane 42 - 58 
Transmembrane 142 - 158 
Transmembrane 182 - 198 
257 - 273 



179 - 214) 
257 - 276 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certaxnty=0. 5607 (Affirmative) ■ 

- Certainty=0 . 0000 (Not Clear) < : 

- Certainty=0. 0000 (Not Clear) < s 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF73092 GB:AF103793 transmembrane protein OppC [Listeria 
monocytogenes ] 

Identities = 157/325 (48%) , Positives = 219/325 (67%) , Gaps = 4/325 (1%) 
Query: 20 EKIEKPALSFMQDAWPJJLKKNKLAWSLYLLALLLTFSLASNLFOTQKDANGFDSKKVTT 79 



WO 02/34771 



-166- 



PCT/GB01/04789 



EKI +P+L+F+QD+W R+4KNK A+VSL +LAL++ 



Sb j ct : 


22 


Query: 


80 


Sbj ct: 


82 


Query: 


138 


Sbjct: 


140 


Query: 


19B 


Sbjct: 


200 


Query: 


258 


Sbjct: 


260 


Query: 


318 


Sbjct: 


320 



3 R+SL++A+ A DL+IGV YGL+SG+ GGR+D MQR++EVI +IPNLV+V + 



+L GI +III+IA T W +M+R VR 



f- +EFV+A+ +LGES KI KH++PNI 



1 +IPSAI +EA LS I LG+ P ASLG L++D + LQ PY ++ P 4 



L +1 +AF L+ DGLRDAFDPK D 



A related DNA sequence was identified in S.pyogenes <SEQ ID 323> which encodes the amino acid 
sequence <SEQ ID 324>. Analysis of this protein sequence reveals the following: 

Possible site; 59 



» Seems to have no N-terminal signal sequence 



INTEGRAL Likelihood =-10 
Likelihood 
Likelihood 
Likelihood 
INTEGRAL Likelihood 
Likelihood 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



- 59 ( 37 - 65) 

- 127 ( 109 - 135) 

- 295 ( 270 - 298) 

- 188 ( 172 - 188) 

- 161 ( 145 - 165) 

- 239 ( 223 - 239) 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



■ Certainty=0. 5118 (Affirmative) ■ 

• Certainty=0. 0000 (Not Clear) < i 

• Certainty=0. 0000 (Not Clear) < i 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 91/325 (28%) , Positives = 156/325 (48%) , Gaps = 34/325 (10%) 

SSTQEKIEKPALSEMQDAWRRLKKNKLAWSLYLLALLLTFSLASNLFVTQKDANGFDSK 7 5 
S E 1+ PA S+ + +R+ K V L +L +L S +F +D 
SFASEOTDTPAYSYWKSVFRQFFSKKSTVFMLVILVTVLMMSFIYPMFAN- YDFN 6 9 



KVTTYRNLPPKLSSNLPFWNGSIKYAGNTESTDAYKSQNVPEKVKYALGTDSLGRSVAKR 135 

V+ + + + + +Y GTD G+S+ 

DVSNIND FSKRYIWPNAEYWFGTDKNGQSLFDG 102 



Query: 




Sbjct: 


16 


Query: 




Sbjct: 


70 


Query: 


136 


Sbjct: 


103 




196 


Sbjct: 


162 


Query: 


256 


Sbjct: 


222 




316 



3 R S+L+++ AT I++ IGV G + 



D +M I +IS+IP+++I+ +L 



LG G +I++ TGW ++ +R L YR+ E+ LA+++LG 



EA LS +G+ T SLG I++ 
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Sbjct: 282 VTLILVSLPLYIVGQNDADASDPRS 306 

Based on this analysis, it was predicted that these proteins and their epitopes covdd be useful antigens for 
vaccines or diagnostics. 

Example 95 

A DNA sequence (GBSx0097) was identified in S.agalactiae <SEQ ID 325> which encodes the amino acid 
sequence <SEQ ID 326>. This protein is predicted to be ATPase OppD (oppD). Analysis of this protein 
sequence reveals the following: 

Possible site: 20 



Final Results 

bacterial membrane Certainty=0. 1341 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF73093 GB:AF103793 ATPase OppD [Listeria monocytogenes] 

Identities = 230/342 (67%), Positives = 283/342 (82%), Gaps = 2/342 (0%) 



Query: 63 NSEI-SGOTQFKGRFrLVEIiSEEEWTKVRGNEISMrFQDPMTSLDFTMKIGMQIAEPlWrH 121 

NSEI SG + F G ++ + E+ + K4RG +I+MIFQDPMTSL+PTM IG QI+EP++ H 

Sbjct: 62 NSEIKSGQILFNGMDIAKAHEKQMQKIRGKDIAMIFQDPMTSLNPTMTIGKQISEPLIKH 121 

Query: 122 QKISKKDALKLALELMKDVGIPNAEEHINDYPHQWSGGMRQRAVIAIALAADPEILIADE 181 

QKISK +A K AL L++ VGI NAEE I YPHQ+SGGMRQR VIAI+LA +P+ILIADE 

Sbjct: 122 QKISKHEAHKTALRLLQLVGIANAEERIKQYPKQFSGG^QRVVIAISLACNPQILIADE 181 

Query: 182 PTTALDVTIQAQIimMKKIQAERDSSIVFITHDLGVTOGI^RVAVMYAGKIVEFGTVD 241 

PTTALDVTIQAQIL+LMK +Q + D+SI+FITHDLGWA +ADRVAVMY GKIVE GTVD 

Sbjct: 182 PTTALD VT IQAQI LDLMKDLQKKIDTS 1 1 F I THDLGWANVADRVAVMYGGKI VE I GTVD 241 

Query: 242 EVFYNPQHPYTWGLLNSMPTTDTESGSLES I PGTPPDLLNPPKGDAFAARNEFALDIDHE 301 



Query: 302 E 

EEPP FKVS+TH+AATWLL +P+V PP + +R E++ E+ 
Sbjct: 302 EEPPLFKVSDTHYAATWLLHPDAPEVTPPDAVLRRQEQFAEL 343 

There is also homology to SEQ ID 72. 

SEQ ID 326 (GBS375) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 64 (lane 9; MW 42kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 71 (lane 3; MW 67kDa). 

GBS375-GST was purified as shown in Figure 215, lane 10. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
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Example 96 

A DNA sequence (GBSx0098) was identified in S.agalactiae <SEQ ID 327> which encodes the amino acid 
sequence <SEQ ID 328>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3060 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



MTENRi<mVEVKNVSLTENKGKANEvT^IDOTSFDIYEGEVFGLVGESGSGKTTVGRSIL 60 
M E +KL+E+K++ F + V+A+D++SFDIY+GE GLVGESG GK+T GRSI + 
MNELTEKLLEIKHLKQHFVTPRGT-VKAVDDLSFDIYKGETLGLVGESGCGKSTTGRSII 59 



HKLAK+K +R +V LL+ VGINK+H RYPHEFSGGQRQRIGIARAIAV+P+FIIADE 



PISALDVSIQAQWNLM++LQ+E+GLTYLFIAHDLSM\'-KYISDRIGVM++GKL+E+ +D 



P+HPYTKSLLSAIP PDP+ ER RV Q Y+P++ Q DG+ + E+ PGHFV+ 





1 


Sbjct: 


1 






Sbjct: 


60 




120 


Sbjct: 


120 




180 


Sbjct: 


180 




240 


Sb j Ct : 


240 




298 


Sbjct: 


300 



A related DNA sequence was identified in S.pyogenes <SEQ ID 329> which encodes the amino acid 
sequence <SEQ ID 330>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3900 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 164/306 (53%), Positives = 228/306 (73%), Gaps = 3/306 (0%) 

Query: 6 KKLvEVKNVSLTFNKGKANEVRAIDNVSFDIYEGEVFGLVGESGSGKITVGRSILKLYDI 65 

+KLVEVK++ ++F +GK V A+ N +F I +GE F LVGESGSGKTT+GR+I+ L D 
Sbjct: 3 EKLVEVKDLEISFGEGKKKFV-AVK^!fiNFFIKKGETFSLVGESGSGKITIGRAIIGI J NDT 61 

Query: 66 SDGEITFNGEVISHLKGKA-LHSFRKDAQMIFQDPQASmGFJIKIPJDIVAEGLDIHKLAK 124 
S G+I ++G+VI+ K K+ + + QMIFQDP ASLN R + I++EGL L K 
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Query: 125 SKSDRDSKVQALLDLVGLNKDHLTRYPKEFSGGQRQRIGIARAIAVEPKFIIADEPISAL 184 

++ +R K++ ++ VGL +HLTRYPHEFSGGQRQRIGIARAL + P+F+IADEPISAL 
Sbjct: 122 TEEERKEKIKKMMAEVGLLSEHLTRYPHEFSGGQRQRIGIARAL'TMNPEFVIADEPISAL 181 

5 

Query: 185 DVS I QAQ VVNLMQKLQRECjGIjTYLF IAKDLSMVKYI SDRIGVMHWGKLLEVGTSDDVYNN 244 

DVS++AQV+NL++++Q E+GLTYLFIAHDLS+V++ISDRI V+H G ++EV +++++NN 
Sbjct: 182 DVSVRAQVIOT.LKRMQAEKGLTYLFIAHDLSVVRFISDRIAVIHKGVIVEVMTEELFNN 241 

10 Query: 245 PIHPYTKSLLSAIPEPDPESERQRVHQPYNPAIEQDGQER-QMHEITPGHFVLSTPQEAE 303 

PIHPYT+SLLSA+P PDP ERQ+ Y+P ++ M EI P HFV + EE 

Sbjct: 242 PIHPYTQSLLSAVPIPDPILERQKELWYHPDQKDYTLDKPSMVEIKPNHFVWANQAEIE 301 

Query: 304 EYKKQI 309 
15 +Y+K++ 

Sbjct: 302 KYQKEL 307 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

20 Example 97 

A repeated DNA sequence (GBSx0099) was identified in S.agalactiae <SEQ ID 331> which encodes the 

amino acid sequence <SEQ ID 332>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

25 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3021 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogems. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 98 

A repeated DNA sequence (GBSxOlOO) was identified in S.agalactiae <SEQ ID 333> which encodes the 
amino acid sequence <SEQ ID 334>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

40 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0352 (Affirmative) < suco 

45 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

50 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 99 

A repeated DNA sequence (GBSxOlOl) was identified in S.agalactiae <SEQ ID 335> which encodes the 
amino acid sequence <SEQ ID 336>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

5 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 5857 (AEf irmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

15 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 100 

A repeated DNA sequence (GBSx0103) was identified in S.agalactiae <SEQ ID 337> which encodes the 

amino acid sequence <SEQ ID 338>. Analysis of this protein sequence reveals the following: 

20 Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm — Certainty=0. 14 72 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
30 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 101 

A repeated DNA sequence (GBSx0104) was identified in S.agalactiae <SEQ ID 339> which encodes the 
35 amino acid sequence <SEQ ID 340>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0. 0111 (affirmative) < suco 

bacterial membrane --- Certainty=D. 0000 (Not Clear) c suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 102 

A repeated DNA sequence (GBSx0105) was identified in S.agalactiae <SEQ ID 341> which encodes the 
5 amino acid sequence <SEQ ID 342>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 5S28 (Affirmative) < suco 

bacterial membrane Certainty=D. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 103 

20 A repeated DNA sequence (GBSx0106) was identified in S.agalactiae <SEQ ID 343> which encodes the 
amino acid sequence <SEQ ID 344>. Analysis of this protein sequence reveals the following: 
Possible site: 39 

>>> Seems to have no N-terminal signal sequence 

25 

Final Results 

bacterial cytoplasm Certainty=0. 2059 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

30 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

35 Example 104 

A repeated DNA sequence (GBSx0107) was identified in S.agalactiae <SEQ ID 345> which encodes the 

amino acid sequence <SEQ ID 346>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

40 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2045 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 105 

A DNA sequence (GBSx0108) was identified in S.agalactiae <SEQ ID 347> which encodes the amino acid 
sequence <SEQ ID 348>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3031 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MKIFEKAPAKLNLGLDIKGRCDDGYHELAMIMVSIDLNDYVTISELKEDCIVIDSDSSKM 60 

M+I EKAPAK+NL LD+ + DGYHE+ MIM +IDL D + ++EL ED + + S + + 
SbjCt: 1 MRILEKAPAKINLSLDVTRKRPDGYHEVEMIMTTIDLADRIELTELAEDEVRVSSHNRFV 60 

Query: 61 PLNOTNDVFKAADIIKNQYGINKGVHIRLEKSIPVCAGLGGGSTDAAATIRAl^NRLWNLQ 120 

P + N ++AA +IK++Y + KGV I + K IPV AGL GGS+DAAAT+R LNRLWNIi 
Sbjct: 61 PDDQPJ^IAYQAAKl.IKDRYNVl^GVSIMITKVIPVARGIAGGSSDAAATLRGLNRLWNLN 120 

Query: 121 MDYDEMVAIGFKIGSDVPYCLGGGCSLVLGKGEIVKPLPTIiRPCWIVLVKPDFGISTKSI 180 

+ + + +G +IGSDV +C+- GG +L G+GE +K + T CW++L KP G+ST + 
Sbjct: 121 LSAETLAELGAEIGSDVSFCVYGGTALATGRGEKIKHlSTPPHCl'JVIIiAKPTIGVSTAEV 180 

Query: 181 FRDIDCKSISRVDIDLLKSAILSSDYQLMVKSMGNSLEDITITKNPVISTIECERMLNSGA 240 

+R + I D+ + AI +Q M +GN LE +T+ +P ++ IK +M GA 
Sbjct: 181 YRALKLDGIEHPDVQGMIEAIEEKSFQKMCSRLGNVLESVTLDMHPEVAMIKNQMKRFGA 240 

Query: 241 DVALMTGSGPTVFSMCSTEKKADRVFNSMKGFCKEVYKVRLL 282 

D LM+GSGPTVF + E K R++N ++GFC +VY VR++ 
Sbjct: 241 DAVLMSGSGPTVFGLVQYESKVQRIYNGLRGFCDQVYAVRMI 282 

A related DNA sequence was identified in S.pyogenes <SEQ ID 349> which encodes the amino acid 
sequence <SEQ ID 350>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.87 Transmembrane 28 - 44 ( 27 - 45) 

Final Results 

bacterial membrane Certainty=0 .2147 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 33/52 (63%) , Positives = 38/52 (72%) 

Query: 126 MVAIGFKIGSDVPYCLGGGCSLVLGKGEIVKPLPTLRPCWIVLVKPDFGIST 177 

M+ IG IGSDVPYCL GC+ V GKGE+V + L W+VLVKPDFGIST 
Sbjct: 1 MMDIGIPIGSDVPYCLLSGCAQVTGKGEWCRILGLLSSWWLVKPDFGIST 52 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 106 

A DNA sequence (GBSx0109) was identified in S.agalactiae <SEQ ID 351> which encodes the amino acid 
sequence <SEQ ID 352>. This protein is predicted to be AdcR protein. Analysis of this protein sequence 
reveals the following: 
Possible site: 19 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1264 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MTVLEQKLDHLVSQILLKAENQHELLFGTCQSDVKLTNTQBHILMLLSQEQLTNSDLAKK 60 

M L + ++ +++++L+7AENQHE+L G C S+V LTNTQEHILMLLS+E LTNS+LA.++ 
Sbjct: 1 MRQIAKDINAFLNEVILQAENQHEILIGHCTSEVALTNTQEHILMLLSEESLTNSELARR 60 

Query: 61 LNISQAAOTKAVKSLISQDMLKANKDSKDARITYFELSEriAKPIADEHTHHHDNTLGVYG 120 

LN+SQAAVTKA+KSL+ + Mb+ +KDSKDAR+ +++L++LA+PIA+EH HHH++TL Y 
Sbjct: 61 LNVSQAAVTKAIKSLVKEGMLETSKDSKDARVIFYQLTDLARPIAEEHHHHHEHTLLTYE 120 

Query: 121 RLVNHFSKDEKWLERFLDLFSRELE 146 

++ F+ +E+ V++RFL E++ 
Sbjct: 121 QVATQFTPNEQKVIQRFLTALVGEIK 146 

A related DNA sequence was identified in S.pyogenes <SEQ ID 353> which encodes the amino acid 
sequence <SEQ ID 354>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 153 5 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 106/147 (72%) , Positives = 126/147 (85%) 

Query: 1 MTVLEQKLDHLVSQILLKAENQHELLFGTCQSDVKLTNTQEHILMLLSQEQLTNSDLAKK 60 

M +LE+KLD+LV+ ILLKAENQHELLFG CQSDVKLTNTQEHILMLLSQ++LTN+DLAK 
Sbjct: 1 MGILEKKLDNLVNTILLKAENQHELLFGACQSDVKLTNTQEHILMLLSQQRLTNTDLAKA 60 

Query: 61 LNISQAAVTKAVKSLISQDMLKANKDSKnARITYFELSELAKPIADEHTHHHDNTLGVYG 120 

UJISQAA.VTKA+KSL+ QDML KD+ DAR+TYFEL+BLAKPIA EHTHHHD TL VY 
Sbjct: 61 LNI SQAA VTKAI KSLVKQDMLAGTKDTVDARVTYFELTELAKP IASEHTHHHDETIjNvYN 120 

Query: 121 RLVNHFSKDEiCWLERFLDLFSRELEG 147 

RL+ FS E ++++F+ +F+ ELEG 
Sbjct: 121 RLLQKFSAKELEIVDKFVTVFAEELEG 147 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 107 

A DNA sequence (GBSxOllO) was identified in S.agalactiae <SEQ ID 355> which encodes the amino acid 
sequence <SEQ ID 356>. This protein is predicted to be AdcC protein. Analysis of this protein sequence 
reveals the following: 

Possible site: 43 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1089 (Affirmative) < suco 

bacterial membrane Certainty=Q. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



ISK N +GKKLRIAYLPQQIASFNAGFPS+VYEFVKSGRYPR GWFRRL HDEEHI + 



SL++VGMW++R K++GSI,SGGQKQRAVIARMFASDPD+F+LDEPTTGMDAG+ +FYELM 





1 


Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 



HH+AH HGK+VLMITHDP+EVK YADRNIHLVRNQ PWRCFDJVH P 



A related DNA sequence was identified in S.pyogenes <SEQ ID 357> which encodes the amino acid 
sequence <SEQ ID 358>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 2722 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 190/232 (81%) , Positives = 214/232 (91%) 

Query: 1 MRYITVSGLTFQYDSDPVLEGVNYHLDSGEFVTLTGENGAAKSTLIKATLGILTPKVGTV 60 

MRYI+V L+FQY+S+PVLEG+ YHLDSGEFVT+TGENGAAKSTLIKATLGIIi PK G V 
Sbjct: 1 MRYISVKNLSFQYESEPVLEGITYHIjDSGEFVTMTGENGAAKSTLIKATLGILQPKAGRV 60 



Query: 121 SLEAVGMWDNRHKKIGSLSGGQKQRAVIARKFASDPDIFVLDEPTTGMDAGTTEKFYELM 180 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



25 



Example 108 

A DNA sequence (GBSxOll 1) was identified in S.agalactiae <SEQ ID 359> which encodes the amino ac: 
sequence <SEQ ID 360>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2299 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens f 
vaccines or diagnostics. 



Example 109 

A DNA sequence (GBSxOl 12) was identified in S.agalactiae <SEQ ID 361> which encodes the amino acid 
sequence <SEQ ID 362>. This protein is predicted to be AdcB protein (znuB). Analysis of this protein 
sequence reveals the following: 

30 Possible site: 36 



» Seems to 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



^-terminal signal sequence 



Likelihood =-14 
Likelihood =-11 
Likelihood =-10 
Likelihood = -8 
Likelihood = -5 
Likelihood = -4 
Likelihood = -3 
Likelihood = -3 
Likelihood = -3 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



Transmembrane 



- 161 ( 136 - 172: 

- 45 ( 20 - 47: 

- 277 ( 255 - 280 

- 247 ( 227 - 253: 

- 117 ( 99 - 121 

- 202 ( 183 - 225 

- 71 ( 54 - 74: 

- 222 ( 203 - 225) 

- 94 ( 75 - 



• Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 6731 (Affirmative) . 
■ Certainty=0. 0000 (Not Clear) < ; 
• Certainty=0. 0000 (Not Clear) < ! 



A related GBS nucleic acid sequence <SEQ ID 9487> which encodes amino acid sequence <SEQ ID 9488> 
was also identified. 

50 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA96187 GB:Z71552 AdcB protein [Streptococcus pneumoniae] 
Identities = 197/263 (74%) , Positives = 236/263 (88%) 
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Query: 13 LLDMLSYDFMQRALLAWAISIFAPILGIFLILREQSLMSDTLSHVSLAGVALGWLGIS 72 

+L +LSYDF+QRA LAV+A+S+F+P+LG FLILRRQSLMSDTLSHVSL+GVA G+VLGIS 
Sbjct: 1 MLSLLSYDFIQRAFLAVIAMSLFSPVLGTFLILRRQSLMSDTLSHVSLSGVAFGLVLGIS 60 

Query: 73 PTWSTIFVVTIAAVVLEyLRTVYKHYMEISTAILMSMGLAISLIVMSKAHNVGNVSLEQY 132 

PT STI +V +AAV LEYLRTVYK +MEI TAILMS GLA+SLIVMSK + ++SL+QY 
Sbjct: 61 PWSTIAIVLIA&VFLEYLRTWKSFKEIGTAILMSTGLAVSLIVMSKGKSSSSMSLDQY 120 

Query: 133 LFGSIITIGKEQVIALFVIALITFILTILFIRPMYILTFDEDTAFVDGLPVRTMSILFNV 192 

LFGSI+TI +EQVI+LFVIA + ILT LF+RPMYILTFDEDTAFVDGLPVRTMSILFN+ 
Sbjct: 121 LFGSIVTISEEQVISLFVIAA.WHLTFLFLRPMYILTFDEDTAFVDGLPVRTMSILFHM 180 

Query: 193 VTGIAIALTI PA&GALLVSTIMVLPASIAMRIiGRNFKTVI FLGMLIGFVGMVAGI FLSYY 252 

VTG+AIAL IPAAGALLVSTIMVLPASIA+RLG+NFK+V+ L IGF+GMVAG+++SYY 
Sbjct: 181 VTGVAIALMIPAAGALLVSTI^IVLPASIALRLGKNFKSVMLLASAIGFLGMVAGLYISYY 240 

Query: 253 WETPASATITMIFIGIFLLVSLV 275 

ETPASA+IT+IF+ +F+L+SLV 
Sbjct: 241 AETPASASITIIFVTVF1LISLV 263 

A related DNA sequence was identified in S.pyogenes <SEQ ID 363> which encodes the amino acid 
sequence <SEQ ID 364>. Analysis of this protein sequence reveals the following: 



>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = -3.51 



251 - 267 
221 - 23',' 



Transmembrane 196 



123 - 162) 



245 - 270! 
217 - 243 



171 - 215! 



■ Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



- Certainty=0 . 6986 (Affirmative) • 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following 



in the databases: 



Query: 3 MLDILFYDFMQRAVMAWAI S I FAPILGI FLI LRRQSLMSDTLSHVSLAGVALGWLGI S 62 

ML +L YDF+QRA +AV+A+S+F+P+LG FLILRRQSLMSDTLSHVSL+GVA G+VLGIS 
Sbjct: 1 MLSLLSYDFIQRAFLAVIAMSLFSPVLGTFLILRRQSLMSDTLSHVSLSGVAFGLVLGIS 60 

Query: 63 PTITTI I WVIAAILLEYLRWYKHYME I STAILMSLGLALSLI IMSKSHSSSSMSLEQY 122 

PT++TI +V++AA+ LEYLR VYK +MEI. TAILMS GLA+SLI+MSK SSSSMSL+QY 
Sbjct: 61 PTVSTIAIVLIAAVFLEYLRTVYKSFMEIGTAILMSTGLAVSLIVMSKGKSSSSMSLDQY 120 

Query: 123 LFGS I ITISMEQWALFAIAAI ILILTVLFIRPMYILTFDEDTAFVDGLPVRLMSVLFNI 182 

LFGSI+TIS EQV++LF IAA++LILT LF+RPMYILTFDEDTAFVDGLPVR MS+LFN+ 
Sbjct: 121 LFGSIVTISEEQVISLFVIAAWLILTFLFLRPMYILTFDEDTAFVDGLPVRTMSILFNM 180 

Query: 183 VTGVAIALTIPAAGALLVSTIMVLPASIAMRLGKNFKTVILLGIVIGFSGMLSGIFLSYF 242 

VTGVAIAL I PAAGALLVSTIMVLPAS IA+RLGKNFK+V+LL IGF GM++G+++SY+ 
Sbjct: 181 VTGVAIALMIPflAGALLVSTIMVLPASIALRLGKNFKSVMLLASAIGFLGMVAGLYISYY 240 

Query: 243 FETPASATITMIFISIFLLVSL 264 

ETPASA+IT+IF+++F+L+SL 
Sbjct: 241 AETPASASITIIFVTVFILISL 262 



65 An alignment of the GAS and GBS proteins is shown below: 
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Query: 




Sb 3 ct: 




Query. 




Sbjct: 


62 


Query: 


132 


Sbjct: 


122 


Query: 


192 


Sbjct: 


182 


Query: 


252 


Sbjct: 


242 



Identities = 223/270 (82%) , Positives = 252/270 (92%) 

MLnDMLSYDFMQRALIAWAISIFAPILGIFLILRRQSLMSDTBSHVSLaGVALGVVLGI 71 
++LD+L YDFMQPJV++AWAISIPAPILGIFLILRRQSLMSDTLSHVSLAGVALGWLGI 
VMLDILFTOFMQRAVMAWAISIFAPILGIFLILRRQSLMSDTLSHVSLAGVALGVVLGI 61 

S PTWSTI FWTLAAVVLEYLRTVYKHYME I STAI LMSMGLAI SL IVMSKAHNVGNVSLEQ 131 
SPT +TI W LAA++LEYLR WKHYMEISTAILMS4GLA+SLI+MSK+H+ ++SLEQ 
SPTITTIIWVLAAILLEYLRVVYKEYMEISTAILMSLGLiAIiSLIIMSKSHSSSSMSLEQ 121 

YLFGSIITIGKEQVIALFVIALITFILTILFIRPMyiLTFDEDTAFVDGLPVRTMSILFN 191 
YLFGSIITI EQV+ALF IA I ILT+LFIRPMYILTFDEDTAFVDGLPVR MS+LFN 
YLFGSIITISMEQWALFAIAAIILILTVLFIRPMYILTFDEDTAFVDGLPVRLMSVLFN 181 

VVTGIAIALTIPAAGALLVSTI^m^ASIAMRLGRNFKTVIFLGMLIGFVG^WAGIFLSY 251 
+VTG+AIALTIPAAGALLVSTIMVLPASIAMRLG+NFKTJI LG++IGF GM++GIFLSY 
IVTGVAIALTIPAAGALLVSTIMVLPASIAMRLGKNFKTOILLGIVIGFSGMLSGIFLSY 241 

YWETPASATITMI FIGI FLLVSLVGLLRKR 281 
4 +ETPASATI TMI F I IFLLVSL G+L+KR 
FFETPASATITMI FI S I FLLVSLGGMLKKR 271 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 110 

A DNA sequence (GBSx0113) was identified in S.agalactiae <SEQ ID 365> which encodes the amino acid 
sequence <SEQ ID 366>. This protein is predicted to be streptodornase. Analysis of this protein sequence 
reveals the following: 

Possible site: 59 

>» Seems to have 110 N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2601 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA59264 GB:X84793 streptodornase [Streptococcus pyogenes] 
Identities = 58/167 (34%) , Positives = 85/167 (50%) , Gaps = 30/167 (17%) 

Query: 2 TPIYEGNNLVPSRVELQYVG I DKQGKLLEI KLGGGKEQVDEYGVTTVTLENTS PLAKIDY 61 

TP+Y+G+ L+P V + + D +DE TV + N IDY 
Sbjct: 245 TPVYQGSELLPRAVLVSALSSDGF IDE TVRVFNNVAGFNIDY 286 

Query: 62 KTGMLIKEDGKQAEEGEDPNSDADENEAAIE-SASDIEE^^IOTOTSESDTNNVAPQNRIV 120 

+ G L+ E P ++ D E +E + IE4- +T+T + D N++ Q + V 

Sbjct: 287 QNGGLLTES PVTETDNVEENVEDNIETIEDEVDTDTLKKDDENISLQ-KTV 336 

Query: 121 YVANKGRSNTYWYSLENI-KNANTAWIVQMTEQEAENQHKHHSTTEA 166 

YVA+ G SN YWYS EN+ KN N +V+M+EQ AL + KHHS EA 
Sbjct: 337 YVASSGLSNvYWYSKENMPKNvWLDKVVEMSEQ/TAIiARGKHHSAQE^ 383 

A related DNA sequence was identified in S.pyogenes <SEQ ID 367> which encodes the amino acid 
sequence <SEQ ID 368>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
>» Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial outside — Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Mot Clear! < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

5 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 51/90 (56%) , Positives = 65/90 (72%) , Gaps = 4/90 (4%) 

Query: 1 MTPIYEGNNLVPSRWLQYVGIDKCGKIjLEIKIGGGKEQ\7DEYGVTTVTLENTSPIiAKID 60 
10 +TP+Y N LVP +V LQYVGID+ G LL+IKLG KE VD +GVT+VTL+N SPLA++D 

Sbjct: 182 VTPvTHKNELVPRQWLQYVGIDEMGDLLQIKI^SEKESVDOTGVTSVTLDNVSPLAELD 241 

Query: 61 YKTGMLIKEDGKQAEEGEDPNSDADENEAA 90 
Y+TGM++ D QE EDN + +EEA 
15 Sbjct: 242 YQTGMML--DSTQNE--EDSNLETEEFEEA 267 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 111 

20 A DNA sequence (GBSxOl 14) was identified in S.agalactiae <SEQ ID 369> which encodes the amino acid 
sequence <SEQ ID 370>. This protein is predicted to be tyrosyl-tRNA synthetase (tyrS-1). Analysis of this 
protein sequence reveals the following: 
Possible site: 60 

25 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3618 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC00303 GB:AF008220 tyrosine tRNA synthetase [Bacillus subtilis] 
Identities = 234/420 (55%), Positives = 311/420 (73%), Gaps = 2/420 (0%) 

35 

Query: 2 NIFDELKERGLVFQTTDEDALRKALEEGSVSYYTGYDPTADSLHLGHLVAILTSRRLQLA 61 

N+ ++L RGL+ Q TDE+ L K L E + Y+G+DPTADSLH+GHL+ ILT RR QLA 
Sbjct: 3 NLLEDLSFRGLIQQMTDEEGLNKQLNEEKIRLYSGFDPTADSLHIGHLLPILTLRRFQLA 62 

40 Query: 62 GHKPYALVGGATGLIGDPSFKDVERSLQTKICrWSWGNKIRC-QLSNFLEFETGnNKAVLV 121 

GH P ALVGGATGLIGDPS K ER+L T V K KI+ QLS FL+FE +N AV+ 
Sbjct: 63 GHHPIALVGGATGLIGDPSGKKAERTIiNTADIVSEKSQKIKNQLSRFLDFEAAENPAVIA 122 

Query: 122 NNYDWFSNISFIDFLRDVGKYFTWYMMSKESVKKRIETGISYTEFAYQIMQGYDFYELN 181 
45 NN+DW ++ IDFLRDVGK F +NYM++K++V RIE+GISYTEF+Y I+Q YDF L 

Sbjct: 123 NNFDWIGKMNVIDFLRDVGKNFGINYMLAKDTVSSRIESGISYTEFSYMILQSYDFLNLY 182 

Query: 182 KNY2m'LQIGGSDQWGNIOTAGTELIRR--KSI^ 239 
++ N LQIGGSDQWGN+TAG ELIR+ + + +T+PL+T + G KFGK+EG A+W 
50 Sbjct: 183 RDKNCKLQIGGSDQWGNITAGLELIRKSEEEGAKAFGLTIPLVTKADGTKFGKTEGGAIW 242 

Query: 240 LDADKTSPYE^QFWLNVMDADAVRFLKIFTFLSLKEIEDIRIQFEEAPHQRLAQKTLAR 299 

LD +KTSPYE YQFW+N D D V++LK FTFLS +EIE + E AP +R AQK LA 
Sbjct: 243 LDKEKTSPYEFYQFWINTDDRDVVKYLOFTFLSKEEIEAYAEKTETAPEKREAQKRLAE 302 

55 

Query: 300 EWTLVHGEKAYKEAVNITEQLFAGNI KGLSVKELKCGLRGVPNYHVQTEDNLNI IDLLV 359 

EV +LVHG +A ++A+NI++ LF+GNIK LS +++K G + VP+ V + L+++D+LV 
Sbjct: 303 EVTSLVHGREALEQAINISQALFSGNIKELSAQDVKVGFKDVPSMEVDSTQELSLVDVLV 362 



60 



Query: 360 TSGVVNSKRQAREDVSNG7AIYINGDRIQDLEYTI3ENDKLE!NEITVIRRGKKKYFVLNFK 419 
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S + SKRQARED+ NGA+YING+R ++ YT+S D++EN+ TV+RRGKKKYF++ +K 
Sbjct: 363 QSKLSPSKRQAREDIQNGAWINGERQTEINYTLSGEDRIENQFTVLFJ?GKKKYFLVTYK 422 

A related DNA sequence was identified in S.pyogenes <SEQ ID 371> which encodes the amino acid 
5 sequence <SEQ ID 372>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>» Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 2340 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



15 An alignment of the GAS and GBS proteins is shown below: 

Identities = 344/418 (82%) , Positives = 377/418 (89%) 



Query: 

Sbjct: 

Query: 

Sbjct: 

Query: 

Sbji 

Query: 

Sbjct 

Query: 

Sbjct 

Query: 

Sbj 

ased on 



1 MNIFDELKERGLVFQTTDEDALRKALEEGSVSYYTGYDPTADSLHLGHLVAILTSRRLQL 60 

MNIF+ELK RGLVFQTTDE AL KAL EG VSYYTGYDPTADSLHLGHLVAILTSRRLQL 
1 MNIFEELKARGLVFQTTDEQALVKALTEGQVSYYTGYDPTADSLHLGHLVAILTSRRLQL 60 

61 AGHKPYALVGGATGLIGDPSFKDVERSLQTKKTWSWGNKIRGQLSNFLEFETGDNKAVL 120 

AGHKPYALVGGATGLIGDPSFKD ERSLQTK+TV+ W +KI+GQLS FL+FE GDNKA L 
61 AGHKPYALVGGATGLIGDPSFKDAERSLQTKETVLEWSDKIKGQLSTFLDFENGDNKAEL 120 

121 VNNYDWFSNISFIDFLRDVGKYFTVNYMMSKESVKKRIETGISYTEFAYQIMQGYDFYEL 180 

VNNYDWFS ISFIDFLRDVGKYFrVNYMMSK+SVKKRIETGISYTEFAYQIMQGYDFYEL 
121 vNNYDWFSQISFIDFLRDVGKYFTVNYmSKDSVKKRIETGISYTEFAYQIMQGYDFYEL 180 

181 NKNYNVTLQIGGSDQWGIS1MTAGTELIRRKSNGVSHVMTVPLITDSTGKKFGKSEGNAVWL 240 

N +NVTLQIGGSDQWGNMTAGTEL+R+K++ 
181 NDKHNVTLQIGGSDQWGNMTAGTELLRKKADKTGHVMTWPLI'I 

241 DADKTSPYENTYQFWIJWMDADATOFLKIFTFLSIjKEIEDIRIQFEEAPHQRLAQKTLARE 300 

DADKTSPYEMYQFWLNVMD DAVRFLKI FTFLSL EI +1 QF A H+RLAQKTLARE 
241 DADKTS P YEMYQFWLNVNDDDAVRFLKI FTFLSLDEI AE I ETQFNAARHERLAQKTLARE 300 

301 VVTLVHGEKAYKEAVNITEQLFAGNIKGLSVKELKQGLRGVPNYHVQTEDNIjNIIDLLVT 360 

WTLVHGE+AYK+A+NITEQLFAGNIK LS ELKQGL VPNYHVQ+ DN NI+++LV 
301 VVTLVHGEEAYKQAIoNITEQLFAGNIl^SAIvELKCGLSNVPNYHVQSIDNfnilVEILVA 360 

361 SGVVNSKKQAREDVSNGAIYINGDRIQDLEYTISENT3KLENEITVIRRGKKKYFVLNF 418 

+ + SKRQAREDV NGAIYINGDR+QDL+Y +S +DK+++++TVIRRGKJOCY VL + 
361 AKISPSKRQAREDVQNGAIYINGDRVQDLDYQLSNDDKIDDQLTVIRRGKKKYAVLTY 418 

this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
>r diagnostics. 



Example 112 

A DNA sequence (GBSxOl 15) was identified in S.agalactiae <SEQ ID 373> which encodes the amino a 
sequence <SEQ ID 374>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-12.21 Transmembrane 36 - 52 ( 23 - 59) 

Final Results 

bacterial membrane Certainty=0. 5883 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF04736 GB:AF101781 penicillin-binding protein lb 
[Streptococcus pneumoniae] 
Identities = 445/769 (57%), Positives = 581/769 (74%), Gaps = 9/769 (1%) 

Query: 3 KGNKKLNSSKLGDYTP LEFGSIFLRI VKLLSDPIYVIILLFVMLGVGIAVGYL 55 

K K KG T L+ +IF I +K L + ++V+ L MLG G+A+GY 

Sbjct: 21 KNKKSARPGKKGSSTKKSKTLDKSA1FPAILLSIKALFNLLFVLGFLGGMLGAGIALGYG 80 

Query: 56 ASQVDSVICVPSKNSLOTQVNTLTRVSRLTYSDICSQISEIATDLQRTPVAKDAISDNIKKA 115 

+ D V+VP LV QV ++ +S +TYSD + 1+ I +DL RT ++ + IS+N4KKA 
Sbjct: 81 VALFDKVRVPQTEELVNQVKDISSISEITYSDGTVIASIESDLLRTSISSEQISENLKKA 140 

Query: 116 IIATEDENFNDHKGWPKAVLRAAAGSVLGFGESSGGSTLTQQLLKQQILGDDPSFKRKS 175 

IIATEDE+F +HKGWPKAV+RA G +G G SSGGSTLTQQL+KQQ++GD P+ RK+ 
Sbjct: 141 IIATEDEHFKEHKGWPKAVIRATLGKF\'GLGSSSGGSTLTQQLIKQQWGDAPTLARKA 200 

Query: 176 KEIIYALALERYMDKDSILSDYLNVSPFGRNNKGQNIAGIEEAAQGIFGVSAKDLTIPQA 235 

EI+ ALALER M+KD IL+ YLNV+PFGRNNKGQNIAG +AA+GIFGV A LT+PQA 
Sbjct: 201 AEIVDAL^ERAMNKDEILTTYLNVAPFGRMNKGQNIAGARQAAEGIFGVDASQLTVPQA 260 

Query: 236 AFLAGLPQSP I VYS PYTADAQLKSDKDLS FGI KRQKWLYNMYRTRALTKDEYKS YKDYD 295 

AFLAGLPQSPI YSPY +LKSD+DL G++R K VLY+MYRT AL+KDEY YKDYD 
Sbjct: 261 AFLAGLPQSPITYSPYENTGELKSDEDLEIGLRRS.KAVLYSMYRTGALSKDEYSQYKDYD 320 

Query: 296 IKCTFIKPAVATTNHHDYLYYSALSEAQKVKYOTLIKKDNVSEHDLKMDETRATYRHRA.I 355 

+K+DF+ T DYLY++ L+EAQ+ MY+YL ++DNVS +LKN+ T+ YR A 

Sbjct: 321 LKQDFLPSGTVTCISRDYLYFTTLAEAQERMYDYIAQRDOTSAKELKNEATQKFYRDLAA 380 

Query: 356 EEIQQGGYTIKTTINKSVYQAMQDAAAQYGGLLDDGTGKVQMGNVLTDNSSGAIIGFIGG 415 

+EI+ GGY I TTI++ ++ AMQ A A YG LLDDGTG+V++GNVL DN +GAI+GF+GG 
Sbjct: 381 KEIENGGYKITTTIDQKIHSAMQSAVADYGYLLDDGTGRVEVGNVLMDNQTGAILGFVGG 440 

Query: 416 RNYSENQNNHAFDTAKSPHSSIKPILPYGIAIDQGMLGSGSVLSNYPTTYSSGEKIMHAD 475 

RNY ENQNNHAFDT RSP S+ KP+L YGIAIDQG++GS ++LSNYPT +++G IM+A+ 
Sbjct: 441 RNYQENQNNHAFDTKRSPASTTKPLLAYGIAIDQGLMGSETILSNYPTNFANGNPIMYAN 500 

Query: 476 EEGTAMTOLQESLDISTOIPAFWTYKMLRDRGVDVK1TOIEKLDYPIENFGIESLPLGGGI 535 

+GT M+ L E+L+ SWNIPA+WTY+MLR+ GVDVK YMEK+ Y I +GIESLP+GGGI 
Sbjct: 501 SKGTGMMTLGEALNYSWNIPAY - W'T("RMLREK'GVDVKGYI'IEKMGYEIPEYGIESLPMGGGI 560 

Query: 536 DTSVAQQTNLYQMIANGGVYHKQYMIESIEDSNGKVIYNHESKPVRVFSKATATILQQLL 595 

+ +VAQ TN YQ +AN GVYH++++I IE ++G+V+Y ++ KPV+V+SKATATI+Q LL 
Sbjct: 561 EVTVAQHTNGYQTLAiraGVYHQKHVISKIEAADGRVVYEYQDKPVQVYSKATATIMQGLL 620 

Query: 596 HGPINSGKTTTFKNRLQGLNSGIAGVDWIGKTGTTNSTSDVWLMLSTPKVTLGGWAGHDN 655 

++S TTTFK+ L LN LA DWIGKTGTTN ++WLMLSTP++TLGGW GHD+ 
Sbjct: 621 REVLSSRVTTTFKSNLTSLNPTLANADWIGKTGTTNQDEravIl'JLMIjSTPRLTLGGWIGHDD 680 

Query: 656 NASLAKLTGYNNNANYMAHLWAINNADGOTFGKSERFRLDDSVIKAKVLKSTGLQPGW 715 

N SL++ GY+NN+NYMAHLVNAI A + +G +ERF LD SV+K++VLKSTG +PG V 
Sbjct: 681 IfflSIiSRRAGYSIOTSNYMAHLVNAIQQASPSIWG-NERFALDPSWKSEVLKSTGQKPGKV 739 

Query: 716 TVNGRRITVGGESTTSYWA-KNGPGTMTYRFAIGGTDSDYQKAWSTLGG 763 

+V G+ + V G + TSYWA K+G +YRFAIGG+D+DYQ AWS++ G 
Sbjct: 740 SVEGKEVEVTGSTvTSYWANKSGAPATSYRFAIGGSDADYQNAWSSIVG 788 

A related DNA sequence was identified in S.pyogenes <SEQ ID 375> which encodes the amino acid 
sequence <SEQ ID 376>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.83 Transmembrane 39 - 55 ( 32 - 60) 

Final Results 
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bacterial membrane 
bacterial outside 
bacterial cytoplasm 



Certainty=0. 2932 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Wot Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:AAF04736 GB:AF101781 penicillin-binding protein lb 
[Streptococcus pneumoniae] 
Identities = 438/739 (59%), Positives = 580/739 (78%), Gaps = 2/739 (0%) 

Query: 27 PVIjLRTLRLLSNFFYIVIFLFGMMGFGMAFGYLASQIESVKVPSKESLVKQVESLTMISQ 85 

P +L +++ L N +++ FL GM+G G+A GY + + V4VP E LV QV+ ++ IS+ 
Sbjct: 48 PAILLSIICALFNLLFVLGFLGGMLGAGIALGYGVALFDKVRVPQTEELVNQVKDISSISE 107 

Query: 87 MNYSDNSLISTLDTDLLRTPVANDAISENIKJCAIVSTEDEHFQEHKGIVPKAVFRATIAS 14S 

+ YSD ++I+++++DLLRT ++++ ISEN4-KKAI++TEDEHF+EHKG+VPKAV RATL 
Sbjct: 108 ITYSDGTVIASIESDLLRTSISSEQISENLKKAI IATEDEHFKEHKGWPKAVIRATLGK 167 

Query: 147 vXjGFGEASGGSTLTQQLVKQQVLGDDPTFKRKSKEIVTAIALERYMSKDNILCDYLNVSP 20E 

+G G +SGGSTLTQQL+KQQV+GD PT RK+ EIV ALALER M+KD IL YLNV+P 
Sbjct: 168 FVGLGSSSGGSTLTQQLIKQQWGDAPTLARKAAEIVDAIALERAMNKDEILTTYLNVAP 227 

Query: 207 FGRNNKGQNIAGVEEAARGIFGVSAKDLTVPQAA.FLAGLPQSPIVYSPYLSTGQLKSEKD 266 

FGRNNKGQNIAG +AA GIFGV A LTVPQAAFLAGLPQSPI YSPY +TG+LKS++D 
Sbjct: 228 FGRNNKGQNIAGARQAAEGIFGVDASQLTVPQflAFLAGLPQSPITYSPYENTGELKSDED 287 

Query: 267 MAYGIKRQQNVLFNMYRTGVLSKKEYEDYKAYP1QKDFIQPGSAIVNNHDYLYYTVLADA 326 

+ G++R + VL++MYRTG LSK EY YK Y +++DF+ G+ + DYLY+T IA+A 
Sbjct: 288 LEIGLRRAKAVLYSMYRTGALSKDEYSQYECDYDLKQDFLPSGTVTGISRDYLYFTTLAEA 347 

Query: 327 KKAMYSYLIKI^KVSSRDLKNDETKARYEERALTELQQGGYTITTTINKPIYNAMQTAAA 386 

+ + MY YL +RD VS+++LKN+ T+ Y + A E++ GGY ITTTI++ I++AMQ+A A 
Sbjct: 348 QERI^YIAQRDNVSAKELKNEATQKFYRDLAAKEIEWGGYKITTTIDQKIHSAMQSAVA 4 07 

Query: 387 QFGGLLDDGTGTVQMG^n^TDmTGAVIXSFVGGRDYALNQNNHAFNTVRSPGSSIKPIIA 446 

+G LLDDGTG V++GNVL DN TGA+LGFVGGR+Y NQNNHAF+T RSP S+ KP++A 
Sbjct : 408 DYGYLLDDGTGRVEVGNvLMDNQTGAILGFVGGEHYQENQNNHAFDTKRSPASTTKPLLA 467 

Query: 447 YGPAIDC^LMGSASvISNYPTTYSSGQKIMHADSEGTAMMPLQEALNTSWNIPAFWTQKL 506 

YG AIDQGLMGS ++LSNYPT +++G IM+A+S+C-T MM L EALN SWNIPA+WT ++ 
Sbjct: 468 YGIAIDQGLMGSETILSNYPTNFANGNPIMYANSKGTGMMTLGEAIiNYSWNIPAYWTYRM 527 

Query: 507 LREKGVDVENYMTKMGYKIADYSIESLPLGGGIEVSVAQQTNAYQMLSNNGLYQKQYIVD 566 

LRE GVDV+ YM KMGY+I +Y IESLP+GGGIEV+VAQ TN YQ L+NNG+Y +++++ 
Sbjct: 528 LRENGVDVKGYMEKMGYE I PEYGIESLPMGGG I E VT VAQHTNGYQTLANNG VYHQKHVI S 587 

Query: 567 KITASDGTWYKHENKP IRI FSAATATI LQELLRGP I TSGATTTFKNRIAAINPWIANAD 626 

KI A+DG WY++++KP++++S ATATI+Q LLR ++S TTTFK+ L ++NP LANAD 
Sbjct: 588 KIEAADGRWYEYQDKPVQWSKATATIMQGLLREVLSSRVTTTFKSI^TSIjNPTLANAD 647 

Query: 627 WIGKTGTTENYTDWLVLSTPKOTLGGWAGHDDOT'SIAPLTGYNNNSNYIAYLAIIAINQA 686 

WIGKIGTT ++WL+LSTP++TLGGW GHDDN SL+ GY+NNSNY+A+L MM QA 
Sbjct: 648 WIGKTGTTNQDENMWLMLSTPRLTLOTWIGHDDNHSLSRRAGYSNNSNYMAHLVNAIQQA 707 

Query: 687 DPWIGVGQRFNU3PGVIKANVLKSTGLQPGTVNVNGHTFSVGGEMTTSLWSQK-GPGAM 745 

P++ G +RF LDP V+K+ VLKSTG +PG V+V G V G TS W+ K G A 
Sbjct: 708 SPSIWG-NERFALDPSVvKSEVLKSTGQKPGIWSVEGKEVEVTGSTVTSYWANKSGAPAT 766 

Query: 746 TYRFAIGGTDADYQKAWGN 764 

+YRFAIGG+DADYQ AW + 
Sbjct: 767 SYRFAIGGSDADYQNAWSS 785 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 531/760 (69%) , Positives = 639/760 (83%) , Gaps = 3/760 (0%) 
Query: 6 KKI^SSKLGDYTPLEFGSIFLRIvKLLSDFIYVIILLFVMLGVGLAVGYIASQVDSVKVP 65 
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Query: 66 SKNSLVTQWTLTRVSRLTYSDKSQISEIATDLQRTPVAKDAISDNIKKAIIATEDENEN 125 

SK SLV QV +LT +S++ YSD S IS + TDL RTPVA DAIS+NIKKAI++TEDE+F 
Sbjct: 70 SKESLVKQVESLTMISQMNYSDNSLISTLDTDLLRTPVANDAISENIKKAIVSTEDEHFQ 129 

Query: 126 DHKGWPKAVLRAAAGSVLGFGESSGGSTLTQQIiLKQQILGDDPSFKRKSKEIIYALALE 185 

+HKG+VPKAV RA SVLGFGE+SGG3TLTQQL+KQQ+LGDDP+FKRKSKEI+YALALE 
Sbjct: 130 EHKGIVPKAVFRATIASVLGFGEASGGSTLTQQLVKQQVLGDDPTFKRKSKEIVYALALE 189 

Query: 186 RYMDKDSILSDYLNVSPFGRNNKGQNIAGIEEAAQGIFGVSAKDLTIPQAAFLAGLPQSP 245 

RYM KD+IL DYLKVSPFGRNNKGQNIAG+EEAA+GIFGVSAKDLT+PQAAFLAGLPQSP 
Sbjct: 190 RYMSKDNI LCDYLNVS PFGRNNKGQNI AGVEEAARGI FGVSAKDLTVPQAAFLAGLPQS P 249 

Query: 246 IVYS PYTADAQLKSDKDLS FGIKRQKIvrVLYNMYRTRALTKDEYKS YKDYD I KKDFI KPAV 3 05 

IVYSPY + QLKS+KD+++GIKRQ+NVL+NMYRT L+K EY+ YK Y I+KDFI+P 
Sbjct: 250 IVYSPYLSTGQLKSEECDMAYGIKRQQNVLFNMYRTGVLSKKEYEDYKAYPIQKDFIQPGS 309 

Query: 306 ATTNHHDYLYYSALSEAQKVMYNYLIKKDNVSEHDLKNDETRATYRHRAIEEIQQGGYTI 365 

A N4HDYLYY+ L++A+K MY+YLIK+D VS DLKNDET+A Y RA+ E+QQGGYTI 
Sbjct: 310 AIVNlfflDYLYYTVIADAKKAMYSYLIKRDKVSSRDLKNDETKAAYEERALTELQQGGYTI 369 

Query: 366 KTTIWKSVYQAMQDAAAQYGGLLDDGTGKVQMGNVLTDNSSGAIIGFIGGRldYSENQNNH 425 

TTINK +Y AMQ AAAQ+GGLLDDGTG VQMGWLTDN++GA+-.GF+GGR+Y+ NQNNH 
Sbjct: 370 TTTINKPIYNAMQTAAAQFGGLLDDGTGTVQMGNVLTDNATGAVLGFVGGRDYALNQNNH 429 

Query: 426 AFDTARSPGSSIKPILPYGIAIDQGMLGSGSVLSNYPTTYSSGEKIMHADEEGTAMVHLQ 485 

AF+T RSPGSSIKPI+ YG AIDQG++GS SVLSNYPTTYS SG+ KIMHAD EGTAM+ LQ 
Sbjct: 430 AENTVRSPGSSIKPIIAYGPAIDQGLMGSASVLSNYPTTYSSGQKIMHADSEGTAMMPLQ 489 

Query: 486 ESLDISVmiPAFWTYKMbRDRGVDVKNYMEKLDYPIENFGIESLPLGGGIDTSVAQQTNL 545 

E+L+ SWNIPAFWT K+LR++GVDV+NYM K+ Y I ++ IESLPLGGGI+ SVAQQTN 
Sbjct: 490 E!ALNTSWNIPAFWTQICLLREKGVDVT31TYMTKMGYKIADYSIESLPLGGGIEVSVAQQraA 549 

Query: 546 YQMIANGGVYHKQYMIESIEDSNGKVIYNHESKPVRVFSKATATILQQLLHGPINSGKTT 605 

YQM++N G+Y KQY+++ I S+G V+Y HE+KP+R+FS ATATILQ+LL GPI SG TT 
Sbjct: 550 YQMLSNNGLYQKQYIVDKITASDGTWYKHENKPIRIFSAATATILQELLRGPITSGATT 609 

Query: 606 TFKNRLQGLNSGIAGVDWIGKTGTTNSTSDVWLMLSTPKVTLGGWAGHDNNASLAKLTGY 665 

TFKNRL +N LA DWIGKTGTT + +DVWL+LSTPKVTLGGWAGHD+N SLA LTGY 
Sbjct: 610 TFKI^LAAINPWLANADWIGKTGTTEOTTDWLVLSTPKVTLGGWAGHDDKTSLAPLTGY 669 

Query: 666 NNNAimiAHLVmiMIADGOTFGKSERFRLDDSVIKAK^KSTGLQPGVVTWGRRITVG 725 

NNN+NY+A+L NAIN AD N G +RF LD VIKA VLKSTGLQPG V VNG +VG 
Sbjct: 670 NHNSNYIAYLANAINQADPITOIGVGQRFKrJDPGVIKANVLKSTGLQPGTVNUN 729 

Query: 726 GESTTSYWAKNGPGTMTYRFAIGGTDSDYQKAWSTLGGKR 755 

GE TTS W++ GPG MTYRFAIGGTD+DYQKAW G ++ 
Sbjct: 730 GEMTTSLWSQKGPGAMTYRFAIGGTDADYQKAWC-NFGFRK 769 

SEQ ID 374 (GBS64d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 120 (lane 2-4; MW 107kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 120 (lane 5-7; MW 82kDa) and in 
Figure 179 (lane 2; MW 82kDa). 

GBS64d-His was purified as shown in Figure 231, lane 7-8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
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Example 113 

A DNA sequence (GBSx0116) was identified in S.agalactiae <SEQ ID 377> which encodes the amino acid 
sequence <SEQ ID 378>. This protein is predicted to be DNA-dependent RNA polymerase subunit beta 
(rpoB). Analysis of this protein sequence reveals the following: 

5 Possible site: 61 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3505 (Affirmative) < suco 

10 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB56706 GB:Y16468 DNA-dependent RNA polymerase subunit beta 
15 [Listeria monocytogenes] 

Identities = 814/1173 (69%), Positives = 978/1173 (82%), Gaps = 17/1173 (1%) 

Query: 2 AGHEVQYGKHRTRRSFSRIKEVLDLPNLIEIQTDSFQDFLDAGLKEVFEDVLPISNPTDT 61 
+GH+V+YG+HRTRRSF+RI EVL+LPNLIEIQT S+Q FLD GIi+E+F D+ PI +F 
20 Sbjct: 5 SGHDVKYGRHRTRRSFARISEVLELPNLIEIQTASYQWFLDEGLREMFRDISPIEDFAGN 64 



Query: 62 MDLEFVGYELKEPKYTLEEARIHDASYSAPIFVTFRLVNKETGEIKTQEVFFGDFPIMTE 121 



Query: 122 MGTFIINGGERIIVSQLWSPGWFM3KVDICJGKVGYGSTVIPNRGAWLELETDAKDIAY 1B1 

MGTFIING ER+ 1 VSQLVRSPGVYFN K+DKNGK G+GSTVIPNRGAWLE ETDAKD+ + 
Sbjct: 125 MGTFIINGAERVIVSQLWSPGWFNGKI£)KNGKKGFGSTvTPNRGAWLEYETDAKDVVH 184 

Query: 182 TRIDRTRKIPFTTLVRALGFSGDDEIVDIFGDSEIjVRNTIEKDIHKNPSDSRTDEALKEI 241 

RIDRTRK+P T L+RALGF D EI+D+ GD++ +RNT+EKD N ++AL EI 

Sbjct: 185 VRIDRTRKLPVTVLLRALGFGSDQEIIDLIGDNDYLRNTLEKDNTDN AEKALLEI 239 



YERLRPGEP T D++RSLLV+RFFDP+RYDLA+VGRYKINKKL+LK RL NQT+AE LVD 



Query: 242 
Sbjct: 240 

Query: 302 GETGEIL\/EAGTVMTRDVIDSIAEHIDGDLNKFVYTPM3YAVVTEPVILQKFKVVAPTDP 361 

ETGEI+ G ++ R +D I +++ + P D V+ + V++Q K+ AP D 

Sbjct: 300 PETGEIIASKGDILDRRNLDQIIPI^LENGVGFR7LRPTD-GVMEDSVLVQSIKIYAPNDE 358 



Query: 362 DRWTIVGNSNPEDKVRALTPADIIAEMSYFI 

++ + I+GN+ E+ V4 +TP+DI++ +SYF NL G+G DDIDHLGNRR+R+VGELL 
Sbjct: 359 EKEINIIGNAYIEENVKHITPSDIISSISYFFNLLHGVGDTDDIDHLGNRRLRSVGELLQ 418 



Sbjct: 419 

Query: 482 SELSHKRRLSALGPGGLTRDRAGYEVRDVHYTHYGRMCPIETPEGPNIGLINNLSSFGHL 541 

EL+HKRRLSALGPGGLTR+RAGYEVRDVHY+HYGRMCPIETPEGPNIGLIN+LSSF + 
Sbjct: 479 GELTHKRRLSALGPGGLTRERAGYEVRDVHYSHYGRI1CPIETPEGPNIGL1NSLSSFAKV 538 



Query: 542 NKYGF I QTPYRKVDRSTGAVTNE I VWLTADEEDEFTVAQANSKLNEDGTFAEEI VMGRHQ 601 



Query: 602 GNNQEFPSSIVDFVDVSPKQVVAVATACIPFLENDDSNFALMGANMQRQAVPLIDPKAPY 661 

N +D++DVSPKQW+VATACIPFLEI-CDSNRALMGANMQRQAVPL+ P+AP+ 

Sbjct: 599 SENLRVEKERIDYmvSPKQWSVATACIPFLFJvTODSNRALMGANMQRQAVPLtfflPEAPF 658 

Query: 662 VGTGMEYQAAHDSGAAVIAKHDGRVI FSDAEKVEVRRED GSLDVYHVQKFRR 713 

VGTGME+ +A DSGAAV AKHDG V +A ++ VRR G +D Y ++KF R 

Sbjct: 659 VGTGMEHVSAKDSGAAVTAKHDGIVEHVEAREIWVRRVSLVDGKEVTGGIDKYTLRKFVR 718 
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Query: 714 SNSGTAraQRTLVKVGDLVEKGDFIMGPSMENGEMALGQNPWA-YMTWEGYKFEDAVIM 773 

SN GT YNQR V GD V KG+ + +GPSM++GE+ALG+N +VA+MTW+GYN+EDA+IM 
Sbjct: 719 SNQGTCYNQRPWAEGDRVVKGEILGNGPSMDSGELALGENVLVAFMTWDGYNYEDAIIM 778 



Sbjct: 779 SERLVKDDVYTS IHIEEFESEARDTKLGPEEMTRDI PNVGEDALRDLDERGI IRVGAEVK 838 

Query: 834 EGDILVGKVTPKGEKDLSAEERLLHAIFGDKSREVRDTSLRVPHGGDGWRDVKIFTRAN 893 

+ D+LVGKVTPKG +L+AEERLLHAIFG+K+REVRDTSLRVPHGG G+V DVKIFTR 
Sbjct: 839 DNDLLVGEWTPKGVTELTAEERLLHAIFGEKAREVRDTSLRVPHGGGGIVLDVKIFTREA 898 

Query: 894 GDELQSGVNMLVRVYIAQKRKIKVGDKMAGRHGNKGWSRIVPVEDMPYLPDGTPVDIML 953 

GDEL GVN LVRVYI QKRKI GDKMAGRHGNKGV+SRI+P EDMP++PDGTPVDIML 
Sbjct: 899 GDELPPGVNQLVRVYIVQKRKIHEGDKI'IAGRHGNKGVISRIIiPEEDMPFMPDGTPVDIML 958 

Query: 954 NPLGVPSRMNIGQVMELHLGMAARNLGIHIATPVFDGASSEDLWETVQEAGMDSDAKTVL 1013 

NPLGVPSRMNIGQV+ELHLGMAAR LGIH+ATPVFDGA4 ED+W TV+EAGM DAKT+L 
Sbjct: 959 NPLGVPSRMNIGQVLELHLGMAARALGIHVATPVFDGANEEDVWSTVEEAGMARDAKTIIi 1018 

Query: 1014 YDGRTGEPFDNRVSVGVMYMIKLHHM\'DDKLHARSVGPYSI J VTQQPLGGKAQFGGQRFGE 1073 

YDGR+GE FDNR+SVGVMYMIKL HMVDDKLHARS GPYSLVTQQPLGGKAQFGGQRFGE 
Sbjct: 1019 YDGRSGEAFDNRISVGVI^MIKLAHMVDDKLHRRSTGPYSLVTQQPLGGKAQFGGQRFGE 1078 

Query: 1074 MEVWALEAYGASNVLQEILTYKSDDVTGRLKAYEAITKGKPIPKPGVPESFRVLVKELQS 1133 

MEVWALEAYGA+ LQEILT KSDDV GR4K YEAI KG+ +P+PGVPESF+VL+KELQS 
Sbjct: 1079 MEVWALEAYGAAYTLQEILTIKSDDWGRVKTYEAIVKGESVPEPGVPESFKVLIKELQS 113B 

Query: 1134 LGLDMRVLDEDDNEVELRDLDEGEDDDVMHVDD 1166 

LG+D+++L D+ E+E+RD+D DDD + +D 
Sbjct: 1139 LGMDVKMLSADEEEIEMRDMD DDDFTNQND 1168 

A related DNA sequence was identified in S.pyogenes <SEQ ID 379> which encodes the amino acid 
sequence <SEQ ID 380>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0 . 3392 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 1129/1190 (94%) , Positives = 1168/1190 (97%) , Gaps = 3/1190 (0%) 

Query: 1 MAGHEVQYGKHRTRRSFSRIKEVLDLPNLISIQTDSFQDFLDAGLKEVFEDVLPISNFTD 60 

+AGHEV+YGKHRTRRSFSRIKEVLDLPNLIEIQTDSFQDFLD+GLKEVFEDVLPISNFTD 
Sbjct: 1 LAGHEVRYGKHRTRRSFSRIKSVLDLPNLIEIQTDSFQDFLDSGLKEVFEDVLPISNFTD 60 

Query: 61 TMDLEFVGYELKEPKYTLEEARIHDASYSAPIFVTFRLVNKETGEIKTQEVFFGDFPIMT 120 

TM+LEFVGYE KEPKYTLEEAR1HDASYSAPIFVTFRLVNKETGEIKTQEVFFGDFPIMT 
Sbjct: 61 TMELEFVGYEFKEPKYTLEEARIHDASYSAPIFVTFRLVNKETGEIKTQEVFFGDFPIMT 120 

Query: 121 EMGTFIINGGERIIVSQLWSPGWFITOKvDKWGKVGYGSTVIPNRGAWLELETDAKDIA 180 

EMGTFlINGGERIIVSQIiVRSPGVYFNDKVDKNGKVGYGSTVIPNRGAWLELETD+KDIA 
Sbjct: 121 EMGTFIINGGERIIVSQLWSPGvYFNDKVDKNGKVGYGSTVIPNRGAWLELETDSKDIA 180 

Query: 181 YTRIDRTRKIPFTTLVRALGFSGDDEIVDIFGDSELVRNTIEKDIHKNPSDSRTDEALKE 240 

YTRIDRTRKIPFTTLvRALGFSGDDEIVDIFG+S+LVRNTIEKDIHKNPSDSRTDEALKE 
Sbjct: 181 YTRIDRTRKIPFTTLVRALGFSGDDEIVDIFGESDLVRNTIEKDIHKNPSDSRTDEALECE 240 

Query: 241 IYERLRPGEPKTADSSRSLLVARFFDPRRYD]^VGRYKINKKLNLKTRLUJQTIAENLV 300 

IYERLRPGEPKTADSSRSLL+ARFFD RRYDLAAVGRYK+NKKLN+KTRLLNQ IAENLV 
Sbjct: 241 IYFJ^RPGEPKTADSSRSLLIARFFDARRYDT^VGRVK^KKLNIKTRLLNQIIAFJ^V 300 
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Query: 301 DGETGEILVEAGlWITRDVIDSIAEHIEGDLNKPVyTPIOYAWT3PVILQKFKVVAPTD 360 

D ETGEILVEAGT MTR VI+SI EH+DGDRJKFVYTPNDYAWTEPV+LQKFKW+P D 
Sbjct: 3 01 DAETGEILVEAGTEMTRSVIESIEEHLDGDLNKFVYTPNDYAWTEPVVIjQKFKWSPlD 360 

Query: 361 PDRVVTIVGNSNPEDIWRALTPADILAEMSYFIJJIAEGIGKVDDIDHLGNRRIRAVGELL 420 

PDRVVTIVGN+NP+DKVRALTPADIIAEMSYFIJSnjAEG+GKVDD 
Sbjct: 361 PDRVWIVGNANPDDKVRALTPADILAEMSYFIiNIiAEGLGKVDDIDHLGNRRIRAVGELL 420 

Query: 421 ANQFRIGIARMERNTOERMSVQDNEVLTPQQIIH1RPVTAAVKEFFGSSQLSQFMDQHNP 480 
ANQFRIGLARMERNVRERMSVQDN+VLTPQQIINIRPVTAAVKEFFGSSQLSQFMDQHNP 

Sbjct: 421 
Query: 481 
Sbjct: 481 

Query: 541 LNKXGFIQTPYRKVDRSTGAVTNEIVWLTADEEDEFTVAQANSIONEDGTFAEEIVMGRH 600 

LNKYGFIQTPYRKVDR+TG VTNEIWLTADEEDE+TVAQANSKLNEDGTFAEEIVMGRH 
Sbjct: 541 LNKYGFIQTPYRKVDRATGTVTNEIVWLTADEEDEYTVAQANSKLNEDGTFAEEIVMGRH 600 

Query: 601 QGNNQEFPSSIVDFVDVSPKQWAVATACIPFLENDDSNRALMGANMQRQAVPLIDPKAP 660 

QGNNQEF + S + VD FVDVS PKQWAVATAC I PFLENDDSNRALMGANMQRQAVPL I DPKAP 
Sbjct: 601 QGNNQEFSASWDFVDVSPKQWAVATACIPFLENDDSNRALMGANMQRQAVPLIDPKAP 660 



Query: 721 NQRTLVKVGDLVEKGDFIADGPSMENGEMAMQNPVVAYMTWEGYNFEDAVIMSERLVKE 780 

NQRTLVKVGD+VEKGDFIADGPSMENGEMAIGQNPVVAYMTWEGYNFEDAVIMSERLVKE 
Sbjct: 721 NQRTLVKVGDIVEKGDFIADGPSMENG3MALGQNPWAYMTWEGYNFEDAVIMSERLVKE 780 

Query: 781 DVYTSVHLEEFESETRDTKLGPEEITRE1PNVGEDSLRDLDEMGIIRIGAEVKEGDILVG 840 

DVYTSVHI1EEFESETRDTKLGPEEITREIPNVGE++L+DLDEMGIIRIGAEVKEGDILVG 
Sbjct: 781 DVYTSVHLEEFESETRDTKLGPEEITREIPNVGEEALKDLDEMGIIRIGAEVKEGDILVG 840 

Query: 841 KVTPKGEKDLSAEERLLHAI FGDKSREVRDTSLRVPHGGDGWRDVKI FTRANGDELQSG 900 

KVTPKGEKDLSAEERLLHAIFGDKSREVRDTSLRVPHGGDG+VRDVKIFTRANGDELQSG 
Sbjct: 841 KVTPKGEKDLSAEERLLHAIFGDKSREVRDTSLRVPHGGDGIVRDVKI FTRANGDELQSG 900 

Query: 901 VMyiLVRVYIAQKRKIKVGDKMAGRHGNKGWSRIVPVEDMPYLPDGTPVDIMLNPLGVPS 960 

VNMLVRVYIAQKRKIKVGDKMAGRHGNKGWSRIVPVEDMPYLPDGTPVD1MLNPLGVPS 
Sbjct: 901 VMMLVRVYIAQKRKIKVGDKMAGRHGNKGWSRIVPVEDMPYLPDGTPVDIMLKPLGVPS 960 

Query: 961 RMKIGQWELHLG^IAAR1^GIHIATPVFDGASSEDLWETVQEAGMDSDAKTVLYDGRTGE 1020 

RMN1GQVMELHLGMAAR[^GIHIATPVFDGASSEDLW+TV+EAGMDSDAKTVLYDGRTGE 
Sbjct: 961 RMNIGQVMELHLGMAi«mGIHIATPVFDGASSEDLWDTVREAGMDSDAKTVLYDGRTGE 1020 

Query: 1021 PFDNRVSVGVMYMIKLHHMVDDKLHARSVGPYSLVTQQPLGGKAQFGGQRFGEMEVWALE 1080 

PFDNRVSVGVMYMIKLHHMVDDKLHARSVGPYSLVTQQPLGGKAQFGGQRFGEMEVWALE 
Sbjct: 1021 PFDNRVSVGVMYMIKLHHMVDDKLHARSVGPYSLVTQQPLGGKAQFGGQRFGEMEVWALE 1080 

Query: 1081 AYGASNVLQEILTYKSDDVTGRLKAYEAITKGKPIPKPGVPESFRVLVKELQSLGLDMRV 1140 

AYGASNVLQEILTYKSDDVTGRLKAYEA1TXC-KPIPKPGVPESFRVLVKELQSLGLDMRV 
Sbjct: 1081 AYGftSira,QEILTYKSDDVTGRLKAYEAITKGKPIPKPGVPESFRVLVKELQSLGLDMRV 1140 

Query: 1141 LDEDDNEVELRDLDEGEDDDVMHVDDLEKARVKQEAEEKQAEQVSEWQE 1190 

LDEDDNEVELRDLDEGEDDD+MHVDDLEKAR KQ E ++VSE E 
Sbjct: 1141 LDEDDHEVELRDLDEGEDDDIMHVDDLEKAREKQAQE TQEVSETTDE 1187 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 114 

A DNA sequence (GBSxOl 18) was identified in S.agalactiae <SEQ ID 381> which encodes the amino acid 
sequence <SEQ ID 382>. This protein is predicted to be DNA-directed RNA polymerase, beta subunit 
(rpoC). Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1892 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 383> which encodes the amino acid 
sequence <SEQ ID 384>. Analysis of this protein sequence reveals the following: 

io N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2128 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 1148/1205 (95%), Positives = 1177/1205 (97%) 

VVDVNRFKSMQITLASPSKVRSWSYGEVKKPETINYRTLKPEREGLFDEV1FGPTKDWEC 70 
WDVNRFKSMQITIiASPSKVRSWSYGEVKKPETINyRTLKPEREGLFDEVIFGPTKDWEC 
VVDVNRFKSMQITLASPSKVRSWSYGEVKKPETINYRTLKPEREGLFDEVIFGPTKDWEC GO 





11 


Sbjct: 


1 


Query: 


71 


Sbjct: 


61 


Query: 


131 


Sbjct: 


121 


Query: 


191 


Sbjct: 


181 




251 


Sbjct: 


241 




311 


Sbjct: 


301 


Query: 


371 


Sbjct: 


361 






Sbj ct: 






491 


Sb j ct : 


481 




551 



SPRALEEVIYFAAYWIDP DTPLEPKSLLTEREYREKLQEYG+GSFVAKMGAEAIQDLL 



KRVDL AEIA LKEELKSA+GQKR+KAVRRIjDVLDAF KSC-NKPEWMVLNILPVIPPDLR 



EMAIELFKPFVMREIVA++ AGNVKAAKRMVERGDERIWDI LEEVIKEHPVLLNRAPTLH 



RLGIQAFEPVLIDGKALRLHPLVCEAYNADFDGDQMAIHVPLSEEAQAEARLLMLAAEHI 490 
RLGIQAFEPVLIDGKALRLHPLVCEAYNADFDGI^MAIHVPLSEEAQAEARLLMIAAEHI 
RLGIQAFEPVLIDGKALRLHPLVCEAYNADFDGDQMAIHVPLSEEAQAEARLLMLAAEHI 480 



LNPKDGKPWTPSQDMVLGNYYLTMEDAGREGEGMIFKD DEAVMAY+NGY HLH+RVGI 



Query: 551 AVDSMPNKPWTEEQKHKIMVTTVGKILFNDIMPEDLPYLIEPNKANLTEICTPDKYFLEPG 6 
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AVDSMPNKPW + Q+HKIMVTTVGK1I1FNDIMPEDLPYL EPNNANLTE TPDKYFLEPG 



Query: 611 QDIQAVIDNLEINIPFKKKNLGNIIAETFKRFRTTETSAFLDRLKDLGYYHSTLAGLTVG 670 

QDIQ VID L+IN+PFKKKNLGNIIAETFKRFRTTETSAFLDRLKDLGYYHSTLAGLTVG 
Sbjct: 601 QDIQEVIDRLDI3WPFKKKNLGNIIAETFKRFRTTETSAFLDRLKDLGYYHSTIAGLTVG 660 

Query: 671 IADI PVIDNKAEI IDAAHHRVEDINKAFRRGLMTEEDRYVAVTTTWREAKEALEKRLIET 730 

IADI PVIDNKAEI IDA?VHHRVE+INKAFRRGU1T++DRYVAVTTTWREAKEALEKRLIET 
Sbjct: 661 IADIPVIDNKAEIIDAAHHRVEEINKAFRRGLMTDDDRWAVTTTWREAKEALEKRLIET 720 

Query: 731 QDPKNPIVMMMDSGARGNISNFSQLAGMRGLMAAPNGRIMELPILSNFREGLSVLEMFFS 790 



Query: 791 THGARKGMTDTALKTADSGYLTRRLVDVAQDVIIREDDCGTDRGLTITAITDGKEVTETL 850 

THGARKGMTDTALKTADSGYLTRRLVDVAQDVIIREDDCGTDRGL I AITDGKEVTETL 
Sbjct: 7B1 THGARKGMXDTALKTADSGYLTRRLVDVAQDVIIREDDCGTDRGLLIRftlTDGKEVTETL 840 

Query: 851 EERLIGRYTKK3IKHPETGEILVGADTLITEDMAAKWKAGVEEVTIRSVFTCNTRHGVC 910 

EERL GRYT+ KS + KHPETGE+L+GAD LITEDMA K+V AGVEEVTIRSVFTC TRHGVC 
Sbjct: 841 EERLQGRYTRKSVKHPETGEVLIGADQLITEDMARKIVDAGVEEVTIRSVFTCATRHGVC 900 

Query: 911 RHCYGINLftTGDAVEVGEAVGTIAAQSIGEPGTQLTMRTFHTGGVASNTDITQGLPRIQE 970 

RHCYGINLATGDAVEVGEAVGTIAAQSIGEPGTQLTMRTFHTGGVASNTDITQGLPRIQE 
Sbjct: 901 RHCYGINLATGDAVEVGEAVGTIAAQSIGEPGTQLTMRTFHTGGVASNTDITQGLPRIQE 960 

Query: 971 IFEARNPKGFAVITEVKGEWAIEEDSSTRTKKVFVKGQTGEGEYWPFTARMKVEVGDE 1030 

I FEARNPKGEAVI TEVKG W IEED+STRTKKV+V+G+TG GEYV+PFTARMKVEVGDE 
Sbjct: 961 IFEARNPKGFAVITEYKGMVVEIEEDASTRTKKVYVQGKTGMGEYVIPFTARMKVEVGDE 1020 

Query: 1031 VARGAALTEGSIQPKRIiLEVRDTLSVETYLLAEVQKVYRSQGVEIGDKHVEVMWQMLRK 1090 

V RGAALTEGSIQPKRLLEVRDTLSVETYLIAEVQCTYRSQ^SVEIGDKHVEWIVRQMLRK 
Sbjct: 1021 VNRGAALTEGSIQPKRLLEVRDTLSWTYLLAEVQKVYRSQ/3VEIGDKHVEVMVRQMLRK 1080 

Query: 1091 VRVMDPGDTDLLPGTLMDISDFTDANKDIVISGGIPATSRPVLMGITKASLETNSFLSAA 1150 

VRVMDPGDTDLLPGTLMDISDFTDANKDIVISGGIPATSRPVLMGITKRSLETNSFLSAA 
Sbjct: 1081 VRVMDPGDTDLLPGTLMDISDFTDANKDIVISGGIPATSRPVLMGITKASLETNSFLSAA 1140 

Query: 1151 SFQETTRVLTDAAIRGKKDHLLGLKENVI IGKI I PAGTGMARYRNIEPLAVNEVEI IEGT 1210 
SFQETTRVLTDAAIRGKKDHLLGLKENVIIGKIIPAGTGMARYRNIEP A+NE+E+I+ T 

Query: 1211 PVDAE 1215 
V AE 

Sbjct: 1201 EVSAE 1205 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 115 

A DNA sequence (GBSx0120) was identified in S.agalactiae <SEQ ID 385> which encodes the amino acid 
sequence <SEQ ID 386>. This protein is predicted to be a DNA binding protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 19 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4727 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=o. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 
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Query: 

MY+ W+M+GD+EPWWF+ +GWE DI + ++ +AL +++ +W + + ++ ++S+S L 
Sbjct: 1 ^ra^VVEMYGDFEPWWFLDGWENDIIQEQRFEKYYDALKFYKIQWLKLETEFKEYKSRSDL 60 

Query: 61 LATFWS I KEKRWCEECDEYLCjQYHSLMLLKEWQEI PKEE 99 

+ FW+ ++RWCEECD+Y+QQY S++LL4-+ + IPK + 
Sbjct: 61 MTVFWNENDQRWCEECDDYVQQYRSI ILLEDEKVIPKSK 99 

A related DNA sequence was identified in S.pyogenes <SEQ ED 387> which encodes the amino acid 
sequence <SEQ ID 388>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4741 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 
Identities = 61/121 (50%) , Positives = 83/121 (68%) 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 116 

A DNA sequence (GBSx0121) was identified in S.agalactiae <SEQ ID 389> which encodes the amino acid 
sequence <SEQ ID 390>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2433 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC45310 GB:U81957 putative ABC transporter subunit CoraYA 
[Streptococcus gordonii] 
Identities = 203/319 (53%) , Positives = 255/319 (79%) , Gaps = 1/319 (0%) 

Query: 1 WQSLAKQVIHQAVEvNAQDIYIIPKGDCYELYMRIDDERRFIDVFEFNRMASLISHFKF 60 

MVQ +A+ ++ QA E AQDIY +PK DCYELYMRI DERRFI ++F+++A++ISHFKF 
Sbjct: 1 IWQKIAQAIVRQAKEECAQDIYFVPKDDCYELYT'IRIGDERRFIQTYTJFDQLAAVISHFKF 60 

Query: 61 VAGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFD 120 

+AGMNVGEKRRSQLGSCDY + + S+RLS+VGDYRG ESLVIR+L+ +LK+WF 
Sbjct: 61 lAGMWVGEKERSQLGSCDYRYDD-KETSIRIiSTVGDYRGYESLVIRLLHDEETELKFWFT 119 

Query: 121 NIKQMKE^/LGIRGLYIjFSGPVGSGKTTL^rYQLASEVFKNKQIITIEDPVEIKMDKMLQLQ 180 
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Query: 181 LNEDIGMTYD7\LIKLSLRHRPDILIIGEIRDQATARAVIRASLTGVMVFSTIH?VECSIPGV 240 

USE IG+TY++LIKLSLRHRPD+LIIGEIRD TARAV+RASLTG VFSTIHAKSIPGV 
Sbjct: 180 LJSFETIGLTYESLIKLSLRHRPDLLIIGEIRDSETARAVVRASIiTGATVFSTIHAKSIPGV 23 9 

Y+RL+ELGV+ +EL+ L+ I YQRLIGGG +IDF + N+++H WN+Q+D L GH 
Sbjct: 240 YERLLELGVSEEELKIVLQGICYQRLIGGGGVIDFASDNYQEHEPTVWMQQIDQLLAAGH 299 

Query: 301 ISKKQAQVEKIIPQETTES 319 

I +QA+ EKI Q+ S 
Sbjct: 300 IHPEQAEAEKIRNQQAKTS 318 

A related DNA sequence was identified in S.pyogenes <SEQ ID 391> which encodes the amino acid 
sequence <SEQ ID 392>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1846 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 207/312 (66%) , Positives = 257/312 (82%) 

Query: 1 MVQSLAKQVIHQAVEVNAQDIYIIPKGDCYELYMRIDDERRFIDVFEFNRMASLISHFKF SO 

MVQ+LAK ++ +A +V+AQDIYI+P+ D Y+L++RI DERR +DV++ +RMA LISHFKF 
Sbjct: 1 MVQALAKAILAKAEQVHAQDIYILPRADQYDLFLRIGDERRLVDVYQSDRMAPLISHFKF 60 

Query: 61 VAGMNVGEKRRSQLGSCDYELSEGRLVSLRLSSVGDYRGQESLVIRILYSGHQDLKYWFD 120 

VAGM VGEKRR Q+GSCDY+LS+ + +SLRLSSVGDYRGQESLVIR+L+ ++ + YWFD 
Sbjct: 61 VAGMIVGEKRRCQVGSCDYKLSKDKQLSLRLSSVGDYRGQESLVIRLLHHQNKSVHYWFD 120 

Query: 121 NIKQMKEVLGIRGLYLFSGPVGSGKTTLMYQLASEVFKNKQIITIEDPVEIKNDKMLQLQ 180 

+ ++ +G RGLYLF+GPVGSGKTTLMYQL S + Q+I+IEDPVEIKN ++LQLQ 
Sbjct: 121 GLTKVANQVGGRGLYLFAGPVGSGKTTLMYQL I SNYHQEAQVI S IEDPVEI KNHQILQLQ 180 

Query: 181 IJ)IEDIGMTYDALIKLSLRHRPDILIIGE1RDQATARAVIRASLTGVMVFSTIHAKSIPGV 240 

+N+DIGMTYD LIKLSLRHRPDIL+IGEIRD TARAVIRASLTG MVFST+HAKSI GV 
Sbjct: 181 VNDDIGMTYDNLIKLSLRHRPDILVIGE1RDSQTARAVIRASLTGAMVFSTVHAKSISGV 240 

Query: 241 YDRLIELGVNYQELENSLKLIAYQRLIGGGSLIDFETGNFKKHSSDECWNRQVDILAEEGH 300 

Y RL+ELGV EL N L LIAYQRL+ GG+LID F+ +SS WN+Q+D L E GH 

Sbjct: 241 YARLLELGVTKAELSNCLALIAYQRLLNGGALIDSTQNEFEYYSSSNWNQQIDQIjLEAGH 300 

Query: 301 ISKKQAQVEKII 312 

++ KQA++EKII 
Sbjct: 301 LNPKQAKLEKI I 312 

SEQ ID 390 (GBS63) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 5 (lane 5; MW 39kDa). It was also expressed in E.coli as a GST-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 13 (lane 2; MW 64kDa). 

The GBS63-GST fusion product was purified (Figure 101A; see also Figure 191, lane 3) and used to 
immunise mice (lane 1 product; 20ug/mouse). The resulting antiserum was used for Western blot (Figure 
101B), FACS (Figure 101C ), and in the in vivo passive protection assay (Table III). These tests confirm 
that the protein is immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 117 

A DNA sequence (GBSx0122) was identified in S.agalactiae <SEQ ID 393> which encodes the amino acid 
sequence <SEQ ID 394>. This protein is predicted to be competence protein (mshG). Analysis of this 
protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-14.65 Transmembrane 123 - 139 ( 113 - 144) 

INTEGRAL Likelihood =-13.53 Transmembrane 272 - 288 ( 264 - 295) 

INTEGRAL Likelihood = -8.55 Transmembrane 79 - 95 ( 75 - 102) 

Likelihood = -0.00 Transmembrane 146 - 162 ( 146 - 162) 



45 



Final Results 

15 bacterial membrane Certainty=0 . 5859 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9489> which encodes amino acid sequence <SEQ ID 9490> 
20 was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC45311 GB:U81957 putative ABC transporter subunit ComYB 
[Streptococcus gordonii] 
Identities = 161/280 (57%) , Positives = 219/280 (77%) 

25 

Query: 19 MNKALLEGKDLSKMLGELGFSDTVITQVALADLHGNISRSLLKIESYLANLLLVRKKVIE 78 

M + L G+ S+++ LGFSD V+TQ++LA+LHGN+S +LLKIE YL NL V+KK+IE 
Sbjct: 1 ^QGLANGQAFSEIMASLGFSDAVVTQLSIjAELHGNLSLALLKIEEYLDNLAKVKICKLIE 60 

30 Query: 79 VATYPLILLSFLVLIMIGLRNYLMPQLGENNFATRLITNVPNIFLLLLAVVLIFSLIFYI 138 

VATYP++LL FLVLIMIGLRNYL+PQL NFAT+LI ++P IFLL + ++L + Y+ 
Sbjct: 61 VATYPMMLLGFLVLIMIGLRNYLLPQLSSQNFATQLIGHLPTIFLLTVLMLLGLTGAIYL 120 

Query: 139 IQKKLSRIKVACFLTTIPLVGSYVKLYLTAYYAREWGNLLSQGIELDQIVKVMQNQKSKL 198 
35 + K RI V FL +P VGS+V++YLTAYYAREWGN++ QG+EL QI ++MQ Q+S L 

Sbjct: 121 VFKGQKRIPVYSFLARLPFVGSFVRIYLTAYYAREWGNMIGQGLELSQIFQIMQEQRSVL 180 

Query: 199 FREIGYDMEEGFLSGKAFHQKVLDYPFFLTELSLMIEYGQVKAKLGTELDIYADEKWEDF 258 
F+EIG D+ + +G+ F K+ YPFF ELSL+ IEYG+VK+KLG+EL+ 1 YA + WE+F 
40 Sbjct: 181 FQEIGQDLGQALQNGQEFSDKIASYPFFKKELSLIIEYGEVKSKLGSELEIYALKTWEEF 240 

Query: 259 FTKLARATQL I QPVI F I FVAL 1 1 VM I YAAMLLPMYQNME I 298 
F ++ R LIQP++F+FVAL+IV++YAAMLLP+YQNME+ 
ct: 241 FGRVNRTMNL I QPLVFVFVALMI VLLYAAMLLPLYQNMEV 280 



A related DNA sequence was identified in S.pyogenes <SEQ ID 395> which encodes the amino acid 
sequence <SEQ ID 396>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-12.52 Transmembrane 317 - 333 ( 309 - 339) 
INTEGRAL Likelihood =-10.14 Transmembrane 123 - 139 ( 119 - 147) 
INTEGRAL Likelihood = -6.95 Transmembrane 164 - 180 { 161 - 183) 

Final Results 

bacterial membrane Certainty=0. 6010 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:AAC45311 GB:U81957 putative ABC transporter subunit ComYB 
[Streptococcus gordonii] 
Identities = 139/278 (50%) , Positives = 207/278 (74%) 

Query: 63 MEESLLKGQSIMMLSGLGFSDAILTQISLADRHGraETTLVAIQHYLNQMAEIRRKTVE 122 

M + L GQ +++++ LGFSDA++TQ+SLA+ HGN+ L+ 1+ YL+ +A++++K +E 
Sbjct: 1 MRQGLANGQAFSEIMASLGFSDAVVTQLSLAELHGNLSLALLKIEEYLDNIAK^ 60 

Query: 123 VITYPLILLLFLFVMMLGLRRYLVPQLETQNQITYFLNHFPAFFIGFCSGL1LLFGMVWL 182 

V TYP++LL FL ++M+GLR YL+PQL +QN T + H P F+ L+ L G ++L 

Sbjct: 61 VATYPMMLLGFLVLIMIGLRNYLLPQLSSQNFATQLIGHLPTIFLLTVLMLLGLTGAIYL 120 

Query: 183 RWRSQSRLIO^YSRLSRYPFLGKLLKQYLTSYYAREWGTLIGQGLDLMTILDIMAIEKSSL 242 

++ Q R+ +YS L+R PF+G ++ YLT+YYAREWG +IGQGL+L I IM ++S L 
Sbjct: 121 VFKGQKRIPVYSFLARLPFVGSFVRIYLTAYYAREWGNMIGQGLELSQIFQIMQEQRSVL 180 

Query: 243 MKELAEDIRMSLLEGQAFHIKVATYPFFKKELSLMIEYGEIKSKLGAELEIYAQESWEQF 302 

+E+ +D+ +L GQ F K+A4YPFFKKELSL+IEYGE+KSKLG+ELEIYA ++WE+F 
Sbjct: 181 FQEIGQDLGQALQNGQEFSDKIASYPFFKKELSLIIEYGEVKSKLGSELEIYALKTWEEF 240 

Query: 303 FSQLYQVTQLIQPAIFLWAVTIVMIYAAILLPIYQNM 340 

F ++ + LIQP +F+ VA+ IV++YAA+LLP+YQNM 
Sbjct: 241 FGRVNRTMNLIQPLVFVFVALMIVLLYAAMLLPLYQNM 278 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 148/297 (49%), Positives = 209/297 (69%), Gaps = 2/297 (0%) 

1 ^IVTFDKRSKLLSDCYTDSMNKALLEGKI>LSKMLGELGFSDWITQVAIJ\DLHGNISRSIJL 60 

++ FLKRS+LL Y M ++LL+G+ L+ ML LGFSD ++TQ+4-LAD HGNI +L+ 
45 VIAFLKRSQLLQLDYVLKMEESLLKGQGtADMLSGLGFSDAILTQISLADRHGNIETTLV 104 

61 KIESYLAOTjLLTOKKVIEVATYPLILLSFLVLIMIGLRNYLMPQLGEIOTFATRLITNVPN 120 

1+ YL + +R+K +EV TYPLILL FL ++M+GLR YL+PQL N T + + P 
105 AIQHYI^QMARIRRKTVEVITYPLILLLFLFWIMLGLRRYLVPQLETQNQITYFLNHFPA 164 



Sbjct 
Query: 
Sbjct: 
Query 
Sbjct 



121 IFL-LLLAWLIFSLIFYIIQKRLSRIKVACFLTTIPLVGSYVKLYLTAYYAREWGNLLS 179 

F+ ++L+F ++ ++ + SR+K+ L+ P +G +K YLT+YYAREWG L+ 

165 FF1GFCSGLILLFGMV-WLRWRSQSRLKLYSRLSRYPFLGKLLKQYLTSYYAREWGTLIG 223 

180 QGIELDQIVKVMQNQKSKLFREIGYDMEEGFLSGKAFHQKVLDYPFFLTELSLMIEYGQV 239 

QG++L 1+ +M +KS L +E+ D+ L G+AFH KV YPFF ELSLMIEYG++ 
224 QGLDLMTILDII-IAIEKSSLMKELAEDIRKSIjLEGQAFHIKVATYPFFKKELSLMIEYGEI 283 

240 KAKLGTELDIYADEKWEDFFTKLARATQLIQPVIFIFVALIIVMIYAAMLLPMYQNM 296 

K+KLG EL+IYA E WE FF++L + TQLIQP IF+ VA+ IVMIYAA+LLP+YQNM 
284 KSKLGAELEI YAQESWEQFFSQLYQVTQLI QPAI FLWAVTIVMI YAAILLPIYQNM 340 



A related GBS gene <SEQ ID 8493> and protein <SEQ ID 8494> were also identified. Analysis of tl 
50 protein sequence reveals the following: 



Lipop: Possible site: -1 Crend: 9 

SRCFLG : 0 

McG: Length of II: 2 

Peak Value of UR: 1.24 
Net Charge of CR: 0 
McG: Discrim Score: -8.94 
GvH: Signal Score (-7.5): -4.08 

Possible site: 31 
>» Seems to have no N-terrainal signal sequence 
Amino Acid Composition: calculated from 1 
ALOM program count: 4 value: -14.65 threshold: 
INTEGRAL Likelihood =-14.65 
INTEGRAL Likelihood =-13.53 
INTEGRAL Likelihood = -8.55 Transmembrane 
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PERIPHERAL Likelihood =5.09 14 
modified ALOM score: 3.43 
icml HYPID: 7 CFP: 0.686 

5 *** Reasoning Step: 3 

, Final Results 

bacterial membrane Certainty=D . 6859 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

10 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

57.5/79.7% over 279aa 

Streptococcus gordonii 

15 GP | 2058545] putative ABC transporter subunit ComYB Insert characterized 

ORF00008(355 - 1194 of 1500) 

GP|2058545|gb|AAC45311.l| |U81957(1 - 280 of 282) putative ABC transporter subunit ComYB 
{Streptococcus gordonii} 
20 %Match =33.8 

%Identity =57.5 %Similarity =79.6 

Matches = 161 Mismatches = 57 Conservative Sub.s = 62 

144 174 204 234 264 294 324 354 

25 TLRQVILKNTHQTSGIDKWISWLKKDISVRNRHKSKKLSLKKQRKOTQLFNNLFASGFSLTDMVTFLKRSKLLSDCYTDS 

384 414 444 474 504 534 564 594 

MNEQUjLEGIODLSKMLGELGFSDTVITQVAIADLHGNISRSLLKIESYLANLLLVRKKVIEVATYPLILLSFLVLIMIGLR 

hi I: =1 = = = INI! hlhHhlllhl = 11111 II II I = I h 1 1 1 1 1 1 1 = = 1 1 llllllllll 

30 MRQGLANGQAFSEI^SLGFSDAVVTQLSIAELHGNLSIALLKIEEYLDNLAKVKKKLIEVATYPMMLLGFLVLIMIGLR 
10 20 30 40 50 60 70 80 

624 654 684 714 744 774 804 834 

NYI iMPQLGENNFATRLITNVPNIFLLLI^VVLIFSLIFYIIQKRLSRIKVACFLTTIPLVGSYVKLYLTAYYAREWGNLL 
35 || |:| | | lllhll ::| llll : =:| := h= I II I II : h I I h h : I I I I I 1 I I I I I h = 

NYLLPQLSSQNFATQLIGHLPTIFLLTVLMLLGLTGAIYLVFKGQKRIPVYSFLARLPFVGSFVRIYLTAYYAREWGNMI 
90 100 110 120 130 140 150 160 

864 894 924 954 984 1014 1044 1074 

40 SQGIELDQIVKVMQNQKSKLFREIGYDMEEGFLSGKAFHQKVLDYPFFLTELSLMIEYGQVKAKLGTELDIYADEKWEDF 
Ihll II = = ll hi Ihlll h : = =h I h llll lllhlllhlhllhlhlll : Ihl 
GQGLELSQIFQIMQEQRSVLFQEIGQDLGQALQNGQEFSDKIASYPFFKKELSLIIEYGEVKSKLGSELEIYALKTWEEF 
170 180 190 200 210 220 230 240 

45 1104 1134 1164 1194 1224 1254 1284 1314 

FTKIjARATQLIQPVIFIFVALIIVMIYAAMLLPKYQfMEILS*KIYC*NVRIRRLKHLHF*NVW*HWLQSQELY*FIKD* 
I - I lllh = l = lllhlh = llllllhlllll = 
FGRVNRTMNLIQPLVFVFVALMIVLLYAAMLLPLYQNMEVHL 
250 260 270 280 

50 

SEQ ID 8494 (GBS49) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 11 (lane 5; MW 15kDa). It was also was expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 15 (lane 5; MW 60kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
55 vaccines or diagnostics. 

Example 118 

A DNA sequence (GBSx0123) was identified in S.agalactiae <SEQ ID 397> which encodes the amino acid 
sequence <SEQ ID 398>. This protein is predicted to be ComYD or ComGD. Analysis of this protein 
sequence reveals the following: 



WO 02/34771 



-193- 



PCT/GB01/04789 



a. cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 00 0 0 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CftA75315 GB:Y15043 homology to ComYD from Streptcoccus gordonii, 

and ComGD from Bacillus subtilis [Lactococcus lactis subsp. cremoris] 
Identities = 56/138 (40%) , Positives = 92/138 (66%) , Gaps = 2/138 (1%) 

Query: 12 KOTCAFTLLECLVALVTITGALLVYQGLTKLLAQQIVVMSSSSQSEWVLLTQQLNAEFEGA 71 

K++AFTLLECLVAL+ I+G++LV GLT+++ +0+ + + S+ +W + +Q+ +E GA 
Sbjct: 13 K1RAFTLLECLVALLAISGSVLVI2GLTRMIEEQMKISQNDSRKDWQIFCEQMRSELSGA 72 

Query: 72 HLEYLRQNKLYLRKQDKI'VTFGKSNKDDFRKTGYDGRGYQPMVYGLDNCQMSQTKSMVKL 131 

L+ + QN LY+ K DK + FG DDFRK+ G+GYQPM+Y L ++ ++++K+ 
Sbjct: 73 KLDNVNQNFLYVTK- DKKLRFGLVG- DDFRKSDDKGQGYQPMLYDLKGAKI QAEENLI KI 13 0 

Query: 132 VFYFKDGLKRTFYYDFKE 149 

F +G +R F Y F + 
Sbjct: 131 TIDFDNGGERVFIYRFTD 148 

A related DNA sequence was identified in S. pyogenes <SEQ ID 399> which encodes the amino acid 
sequence <SEQ ID 400>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) c suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA75315 GB:Y15043 homology to ComYD from Streptcoccus gordonii, 

and ComGD from Bacillus subtilis [Lactococcus lactis subsp. cremoris] 
Identities = 65/137 (47%), Positives = 84/137 (60%), Gaps = 2/137 (1%) 

Query: 8 IKAFTLLEALIALLVISGSLLVYQGLTRTLLKHSHYLARHDQDNWLLFSHQLREELSGAR 67 

I+AFTLLE L+ALL ISGS+LV GLTR + + + +W +F Q+R ELSGA+ 

Sbjct: 14 IRAFTLLECLVALLAISGSVLVISGLTRMIEEQMKISQNDSRKDWQIFCEQMRSELSGAK 73 

Query: 68 FYKVADNKLYVEKGKKVLAFGQFKSHDFRKSASNGKGYQPMLFGISRSHIHIEQSQICIT 127 

V N LYV KKKLFG DFRKS G+GYQPML+ + +1 E++ I IT 

Sbjct: 74 LDNVNQNFLYVTKDKK-LRFG-LVGDDFRKSDDKGQGYQPMLYDLKGAKIQAEENLIKIT 131 

Query: 128 LKWKSGLERTFYYAFQD 144 

+ + +G ER F Y F D 
Sbjct: 132 IDFDNGGERVFIYRFTD 148 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 58/137 (42%) , Positives = 88/137 (63%) 

Query: 13 VKAFTLLECLVALVTITGALLWQGLTKLIAQXJIVvMSSSSQSEVWLLTQQLNAEFEGAH 72 

+KAFTLLE L+AL+ I+G+LLVYQGLT+ L + ++ Q W+L + QL E GA 
Sbjct: 8 IKAFTLLFALIAIiLVISGSLLVYQGLTRTLLKHSHYLARHDQDNWLLFSHQLREELSGAR 67 

Query: 73 LEYLRQNKLYLRKQDKIVTFGKSNKDDBIUC^GYDGRGYQP^lVYGLDNCQMSQTKSMVKLV 132 

+ NKLY+ K K++ FG+ DFRK+ +G+GYQPM++G+ + +S + + 
Sbjct: 68 FYKVADNKLYVEKGKKVLAFGQFKSHDFRKSASNGKGYQPMLFGISRSHIHIEQSQICIT 127 
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Query: 133 FYFKDGLKRTFYYDFKK 149 

+K GL+RTFYY F++ 
Sbjct: 128 LKWKSGLERTFYYAFQD 144 

5 

A related GBS gene <SEQ ID 8495> and protein <SEQ ID 8496> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: 4.85 
10 GvH: Signal Score (-7.5): -0.22 

Possible site: 55 
>» Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 12.47 threshold: 0.0 
PERIPHERAL Likelihood = 12.47 127 
15 modified ALOM score: -2.99 

*** Reasoning Step: 3 

Final Results 

20 bacterial outside --- Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

25 GP|328718l| homology to ComYD from Streptcoccus gordonii, and ComGD from Bacillus subtilis 

{Lactococcus lactis subsp. cremoris} Inse 
rt characterized 

ORF00009(334 - 747 of 1053) 
30 GP|328718l|emb|CAA75315.l| |Y15043 (13 - 148 of 150) homology to ComYD from Streptcoccus 

gordonii, and ComGD from Bacillus subtilis {l 
actococcus lactis subsp. cremoris} 
%Match =15.9 

%Identity =40.6 %Similarity =68.1 
35 Matches = 56 Mismatches = 42 Conservative Sub.s = 38 

177 207 237 267 297 327 357 387 

IC**EVGGFFYKIS*SDPWTRYFYFCSSYHCYDLCSNAVTNVSKYGDIIMKNLLLKCKDK1CVKAFTLLECLVALVTIT 

40 MTMERKFCDLKLKIRAFTLLECLVALLAIS 

10 20 30 

417 447 477 507 537 567 597 627 

GALLVYC^LTKLLAQQIVVMSSSSQSEWVLLTQQmAEFEGAHLEYLRQNKLYLRKQDKIVTFGKSNKDDFRKTGYDGRG 
45 ]::|| |||::: :|: : : |: :| :: :|: :|: || |: : || ||= | || : || |||||: |:| 

GSVLVISGLTP^IEEQMKISQroSRKDWQIFCEQMRSELSGAKLDNVNQNFLYVTK-DKKLRFGLVG-DDFRKSDDKGQG 
40 50 60 70 80 90 100 

657 687 717 747 777 807 837 867 

50 YQPMWGLDNCQMSQTKSMVia 1 VFYFKDGLKKTFYYDFKEET*SWHPFASYCIGCCIYTRLTVLSSKNIGNRKTVS*PN* 
||||:| | :: ::::|: | :| :| | | | : 
YQPMLYDLKGAKIQAEENLIKITIDFDNGGERVFIYRFTDTK 
120 130 140 150 

55 SEQ ID 398 (GBS6) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 1 (lane 2; MW 40kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 2 (lane 2; MW 15kDa). The GBS6-GST fusion 
product was purified (Figure 189, lane 2) and used to immunise mice. The resulting antiserum was used for 
FACS (Figure 260), which confirmed that the protein is immunoaccessible on GBS bacteria. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 119 

A DNA sequence (GBSx0124) was identified in S.agalactiae <SEQ ID 401> which encodes the amino acid 
sequence <SEQ ID 402>. Analysis of this protein sequence reveals the following: 

I- terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0. 3831 (Affirmative) < succ 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MNFEKIETAYELILENIQTIENQLKTHIYDALIEQMSyYLGSSCDLDMWVNNQKLRQLD 60 

M + + YEL+ E I+N+L+ +AL E Y D + + +QK +QL 

Sbjct: 1 MQKDHVGRVYELIJ^\A.IMIKNELQISYIEAIAEAaEMYFLEKTD-QLKLPADQICrKQLQ 59 

Query: 61 LSQE EW - RRTFQF I F I KSAQTEQLQANHQFTPDS I GFI LLFIjLEE - LTSQE 109 

E EW R+ FQ +K + + HQ TP0+IG + +L+ + + ++ 

Sbjct: 60 ALLEKAEFGTYEHEWVRKAFQLAVLKGMK- D I SHPNRQMT PDT I GL F I S YLVNKFMADKK 118 

Query: 110 TVDVLEIGSGTGNIAQTLLNN-SSKELNYMGIEVDDLLIDLSASIAEIIGSSAQFIQEDA 168 

+ +L+ GTGNL T+LN S K N GIE+DD+L+ ++ + A ++ + +D+ 
Sbjct: 119 ELTILDPALGTGNLLFTVLNQLSEKTANSFGIEIDDVIiLKIAYAQAKlLLKKELELFHQDS 178 

Query: 169 VRPQILKESDVIISDLPVGYYPNDGIAKRYAVSSSKEHTYAHHLLMEQSLKYLKKDGIAI 228 

+ P + D +1 DLPVGYYPND A+ + + + + H++AHHL +EQS+K+ K G 
Sbjct: 179 LEPLFIDPVDTVICDLPVGYYPNDEGAEAFELKADEGHSFAHHLFIEQSVKHTKPGGYLF 238 



Query: 286 ETFV 

+ + L N++ M + + F +W ++ 
Sbjct: 299 QILLANLPSFSNQKAMLDMMAQFDEWFKK 327 

A related DNA sequence was identified in S.pyogenes <SEQ ID 403> which encodes the amino acid 
sequence <SEQ ID 404>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

»> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 223/315 (70%) , Positives = 270/315 (84%) 

55 Query: 1 ^FEKIETAYELILENIQTIENQLKTHIYDALIEQNSYYLGSSCDLDMWVNNQKLRQLD 60 

M FEKIE AY+L+LEN Q I EN LKTHIYDA++EQNS+YLG+ V N+ KL+ L 

Sbjct: 16 MTFEKIEEAYQLLLENCQLIENDLKTHIYDAIVEQNSFYLGAEGASPQVAQNSDKLKALC 75 

Query: 61 LSQEEWRRTFQFIFIKSAQTEQLQANHQFTPDSIGFILLFLLEELTSQETVDVLEIGSGT 120 
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L++EEWR+ +QF+FIK+AQTEQLQANHQFTPD+IGFILL+LLE+L+ +++++VLEIGSGT 
Sbjct: 76 LTKEEWRKAYQFLFIKaAQTEQLQMIHQFTPDAIGFILLYLLEQLSDKDSLEVLEIGSGT 135 

Query: 121 GNLAQTLLNNSSKELNYMGIEVDDLLIDLSASIAEIIGSSAQFIQEDAWPQILKESDVI 180 

GNLAQTLLNN+SK L+Y+GIE+DDLLIDLSASIAEI+ SSA FIQEDAVRPQ+LKESD++ 
Sbjct: 13S GNIAQTLLNNTSKSLDYVGIELDDLLIDLSASIAEIMDSSAHFIQEDAVRPQLLKESDIV 195 

Query: 181 ISDLPVGYYPNDGIAKRYAVSSSKEHTYAHHLLMEQSLKYLKKDGIAIFLAPENliLTSPQ 240 

ISDLPVGYYPND IAKRY V+SS +HTYAHHLLMEQSLKYLKKDG AIFLAP NLLTSPQ 
Sbjct: 196 ISDLPVGYYPNDDIAKRYKVASSDKHTYAHHLLMEQSLKYLKKDGFAIFLAPVNLLTSPQ 255 

Query: 241 SDLLKEWLKGYADVIAVLTLPETIFGSRQNAKSIFVLKKQAEQKPETFVYPLTDLQNREN 300 

S LLK+WLK YA V+ ++TLP++IFG NAKSI VL+KQ + ETFVYP+ DL+ EN 
Sbjct: 256 SQLLKQWLKDYAQWTLITLPDSIFGHPSNAKSIIVLQKQTDHPMETFVYPIRDLKIAEN 315 

Query: 301 MANFIENFQKWSREN 315 

+ +F+ENF+KW N 
Sbjct: 316 IHDFMENFKKWKLSN 33 0 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 120 

A DNA sequence (GBSx0125) was identified in S.agalactiae <SEQ ID 405> which encodes the amino acid 
sequence <SEQ ID 406>. This protein is predicted to be acetate kinase (ackA-1). Analysis of this protein 
25 sequence reveals the following: 

Possible site: 15 

»> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0. 2384 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty^O . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC36857 GB:L17320 acetate kinase [Bacillus subtilis] 
Identities = 223/395 (56%) , Positives = 293/395 (73%) , Gaps = 3/395 (0%) 



Query: : 

MSK IAINAGSSSLK+QL+EMP E V+ KG++ERIG+ DS+ T+ + +K+ ++ DI DH 
Sbjct: 1 MSKIIAJNAGSSSLKFQLFEMPSETVLTKGLVERIGIADSVFriSVNGEKNTEVTDIPDH 60 

Query: 61 TQAVKILLEDLTKHGIIKDFNEITGVGHRWAGGEYFKESALVDDKWEQVEELSALAPL 120 

AVK+LL LT+ GIIKD NEI G+GHRW GGE F +S L+ D+ ++++E++S LAPL 
Sbjct: 61 AVAvTCMLLNKLTEFGIIKDLNEIDGIGHRVVHGGEKFSDSVLLTDETIKEIEDISEIAPL 120 

Query: 121 HNPAAAAGIRAFREILPDITSVCVFDTAFHTTMQPHTYLYPIPQKYYTDYKVRKYGAHGT 180 

HNPA GI+AF+E+LP++ +V VFDTAFH TM +YLY +P +YY + +RKYG HGT 
Sbjct: 121 HNPANIVGIKAFKEVLPNVPAVAVFDTAFHQTMPEQSYLYSLPYEYYEKFGIRKYGFHGT 180 

Query: 181 SHQYVAQEAAKQLGRPLEELKLITAHVC-NGVSITANYHGQSIDTSMGFTPLAGPMMGTRS 240 

SH+YV + AA+ LGRPL++L+BI+ H+GNG SI A G+SIDTSMGFTPLAG MGTRS 
Sbjct: 181 SHKYVTERAAELLGRPLKDLRLISCHLGNGASIAAVEGGKSIDTSMGFTPLAGVAMGTRS 240 

Query: 241 GDIDPAIIPYLVANDPELEDAAAWNMLMKQSGLLGVSGTSSDMRDIEAGLQSKDPNAVL 300 

G+IDPA+IPY++ + D V+N LNK+SGLLG+SG SSD+RDI + + A 
Sbjct: 241 GNIDPALIPYIMEKTGQTAD--EVLNTLNKKSGLLGISGFSSDLRDIVEATKEGNERAET 298 

Query: 301 AYNVFIDRIKKFIGQYLAVLNGADAIIFTAGMGENAPLMRQDVIAGLSWFGIELDPE-KN 359 

A VF RI K+IG Y A ++G DAI IFTAG+GEN+ +R+ V+ GL + G+ DP N 
Sbjct: 299 ALEVFASRIHKYIGSYAARMSGTOAIIFTAGIGEJ^SVEVRERVLRGLEFMGVYWDPALNN 358 



WO 02/34771 



-197- 



PCT/GB01/04789 



Query: 360 VFGYFGDITKPDSKVKVLVIPTDSELMIARDVERL 394 

V G 1+ P S VKV++IPTDEE+MIARDV RL 
Sbjct: 359 VRGEEAF I SYPHS PVKVMI I PTDEEVMIARDWRL 393 

5 A related DNA sequence was identified in S.pyogenes <SEQ ID 407> which encodes the amino acid 
sequence <SEQ ID 408>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

»> Seems to have no N-terminal signal sequence 
10 INTEGRAL Likelihood = -0.22 Transmembrane S3 - 79 ( 63 - 79) 

Final Results 

bacterial membrane Certainty=0 . 1086 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certair.ty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AAC36857 GB:L17320 acetate kinase [Bacillus subtilis] 
Identities = 218/395 (55%) , Positives = 293/395 (73%) , Gaps = 3/395 (0%) 

Query: 1 MSKTIAINAGSSSLKWQLYQMPEEAVLAQGIIERIGLKDSISTVKYDGKKEEQILDIHDH 60 

MSK IAINAGSSSLK+QL++MP E VL +G++HRIG+ DS+ T+ +G+K ++ DI DH 
Sbjct: 1 MSKIIAINAGSSSLKFQLFEMPSETVLTKGLVERIGIADSVFTISVNGEKNTEVTDIPDH 60 

Query: 61 TEaWILLNDLIHFGIIAAYDEITGVGHRWAGGELFKESVVVNDKVLEQIEELSVLAPL 120 

AVK+LLN L FGII +EI G+GHRW GGE F +SV++ D+ +++IE++S LAPL 
Sbjct: 61 AVAVKMLLNKLTEFGIIKDIiNEICGIGHRV\^GGEKFSDSVLLTDETIKEIEDISEIjAPIj 120 



Query: 121 HNPGAAAGIRAFRDILPDITSVCTFDTSFHTSMAKHTYLYPIPQKYYTDYKVRKYGAHGT 180 

HNP GI+AF+++LP++ +V VFDT+FH +M + +YLY +P +YY + +RKYG HGT 
Sbjct: 121 HNPAl\fIVGIKAFKEVLPNVPAVAVFDTAFHQTMPEQSYLYSLPYEYYEKFGIRKYGFHGT 180 

Query: 181 SHKYVAQEAAKMLGRPLEELKLITAHIGNGVSITANYHGKSVDTSMGFTPLAGPMMGTRS 240 

SHKYV + AA++LGRPL++L+LI+ H+GNC- SI A GKS+DTSMGFTPLAG MGTRS 
Sbjct: 181 SHKYVTERAAELLGRPLKDLRLISCHLGNGASIAAVEGGKSIDTSMGFTPLAGVAMGTRS 240 

Query: 241 GDIDPAIIPYLIEQDPELKDAADWNMLNKKSGLSGVSGISSDMRDIEAGLQEDNPDAVL 300 

G+IDPA+IPY++E+ + D +V+N LNKKSGL G+SG SSD+RDI +E N A 
Sbjct: 241 GNIDPALIPYIMEKTGQTAD- -EVLNTLNKKSGLLGISGFSSDLRDIVEATKEGNERAET 298 



Query: 360 VFGYRGDISTPESKVKVLVISTDEELCIARDVERL 394 

V G IS P S VKV++I TDEE+ IARDV RL 
Sbjct: 359 VRGEEAFISYPHSPVKVMIIPTDEEVMIARDWRL 393 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 332/395 (84%) , Positives = 365/395 (92%) 



Query: 1 MSKTIAINAGSSSLKWQLYEMPEEKWAKGIIERIGLKESISTVKFDDKKDEQILDIVDH 60 

MSKTIAINAGSSSLKWQLY+MPEE V+A+GIIERIGLKDSISTVK+D KK+EQILDI DH 
Sbjct: 1 MSKTIAINAGSSSLKWQLYQMPEEAVLAQGIIERIGLKDSISTVKYDGKKEEQILDIHDH 60 

55 

Query: 61 TCAWILLEDLTKHGIIKDFlffilTGVGHRWAGGEYFKESALVDDKVVEQVEELSALAPL 120 

T+AVKILL DL Gil ++EITGVGHRWAGGE FKES +V+DKV+EQ+EELS LAPL 
Sbjct: 61 TEAVKILI^LIHFGIIAAYDEITGVGHRWAGGELFKESVVVICIKVLEQIEELSVLAPL 120 

60 Query: 121 HNPAAAAGIRAFREILPDITSVCOTOTAFHTTMQPHTYLYPIPQKYYTDYKVRKYGAHGT 180 

HNP AAAGIRAFR+ILPDITSVCVFDT+FHT+M HTYLYPI PQKYYTDYKVRKYGAHGT 
Sbjct: 121 HNPGAAAGIRAFRDILPDITSVCTFDTSFKTSKAKHTYLYPIPQKfYTDYKVRKYGAHGT 180 
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Query: 301 AYWWIDRIKKFIGQY^VZjKGMIAIIFTAGMGENAPLKRQDVIAGLSWFGIELDPEKHV 360 

AYN+FIDRIKK IGQY AVLNGADA++FTAGMGENAPLMRQDVI GL+WFG+++DPEKNV 
Sbjct: 301 AYNIFIDRIKKCIGQYFAVLNGADALVFTAGMGENAPLMRQDVIGGLTWFGMDIDPEKNV 360 

Query: 361 FGYFGD ITKPDS KVKVLVI PTDEELMIARDVERLK 395 
FGY GDI+ P+SKVKVLVI TDEEL IARDVERLK 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 121 

A DNA sequence (GBSx0126) was identified in S.agalactiae <SEQ ID 409> which encodes the amino acid 
sequence <SEQ ID 410>. This protein is predicted to be repressor protein. Analysis of this protein sequence 
reveals the following: 

an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB49550 GB:AJ248284 repressor protein, putative [Pyrococcus 
abyssi] 

Identities = 39/64 (60%) , Positives = 49/64 (75%) 

Query: 1 MKNSLQKLRKSRKLSQAELAVALGVTRQTIISLEKEKYTASLELAFKIARYFDKQIEEVF 60 

MKN L++ R+ L+Q ELA LGVTRQTII++EK KY SL LAFKIAR+F +IE++F 
Sbjct: 1 MKNRLREFREKYGLTQEELARILGVTRQTIIAIEKGKYDPSLRLAFKIARFFGVRIEDIF 60 

Query: 61 IYTE 64 
1Y E 

Sbjct: 61 IYEE 64 

A related DNA sequence was identified in S.pyogenes <SEQ ID 41 1> which encodes the amino acid 
sequence <SEQ ID 412>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4344 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 29/66 (43%) , Positives = 44/66 (65%) 

Query: 1 MKNSLQKLRKSRKLSQAEIAVALGvTRQTIISEEKEKYTASLEIjAFKIARYFDKQIEEVF 60 

+KN L++LR ++Q E+A GV+RQTI +E+ +YT S+ +A KIA+ F + +EEVF 
Sbjct: 10 LKNRLKELRARDGINQTEMAKIAGVSRQTISLIERNEYTPSVIIAMKIAKVFQEPVEEVF 69 
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Query: 61 IYTESE 66 
Sbjct: 70 RLVEVE 75 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 122 

A DNA sequence (GBSx0127) was identified in S.agalactiae <SEQ ID 413> which encodes the amino acid 
10 sequence <SEQ ID 414>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>» Seems to have an uncleavable N-term signal seq 
INTEGRAL Likelihood = -8.97 Transmembrane 
INTEGRAL Likelihood = 
15 INTEGRAL Likelihood = -7.80 Transmembrane 123 - 139 ( 118 - 145! 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 



Transmembrane 123 - 
Transmembrane 177 - 
Transmembrane 81 - 



- Final Results -■ 

bacterial n 
bacterial outside - 
bacterial cytoplasm - 



• Certainty=0 . 4588 (Affirmative) 

• Certainty=0. 0000 (Not Clear) < i 
■ Certainty=0. 0000 (Not Clear) < i 



A related GBS nucleic acid sequence <SEQ ID 949 1> which encodes amino acid sequence <SEQ ID 9492> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 104 MQGVKDTANQTVIMELTKQLPLALMLIFAIIGAP1MEEIIFRYIIPKELFAKHQKWGFVI 163 

MQG TAN + +++L + L+++ I APIMEEI+FR I L + +1 
Sbjct: 1 MQGHTTTANDSTLIKLFSGVSPVLVVLLLGIAAPIMEEIVFRGGIIGYLVENNALLAILI 60 



Query: 164 GTLAFALIHSPSDIGSFIIYAGMGAILSFVYYKTEHLEYSIMIHFINN ALAYSVL 218 

35 + F +IH P++ SF +Y MG ILS YYKT+ L SI IHF+NN A+AY ++ 

Sbjct: 61 SSFLFGIIHGPTNFISFGMYFFMGIILSVSYYKTKDLRVSISIHFLNNLFPAIAIAYGLI 120 



A related DNA sequence was identified in S.pyogenes <SEQ ID 41 5> which encodes the amino acid 
sequence <SEQ ID 416>. Analysis of this protein sequence reveals the following: 

40 Possible site: 24 





have an uncleavable N- 


term signal seq 








INTEGRAL 


Likelihood =-11.41 


Transmembrane 


12 - 28 


1 


30) 


INTEGRAL 


Likelihood = -9.98 


Transmembrane 


41 - 57 


33 


64 


INTEGRAL 


Likelihood = -8.33 


Transmembrane 




121 


151 


INTEGRAL 


Likelihood = -7.96 


Transmembrane 


83 - 99 


76 


103 




Likelihood = -3.77 




208 - 224 


207 


230 


INTEGRAL 


Likelihood = -2.13 


Transmembrane 


182 - 198 


182 


199 



Final Results 

50 bacterial membrane Certainty=0. 5564 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 



55 >GP:BAA11325 GB:D78257 ORF8 [Enterococcus faecalis] 

Identities = 47/120 (39%) , Positives = 70/120 (58%) , Gaps = 8/120 (6%) 
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Query: 105 GQQVSMJDAAIHTLARLIKGGFPLYTALFVLVIAFIAPIMEELVFRGFPMIDLFKGKSLK 164 

G +AND+ TL +L G P+ L VL++ APIMEE+VFRG + L + +L 
Sbjct: 3 GHTTTANDS TLIKLFSGVSPV LVVLLLGIAAPIMEEIVFRGGIIGYLVENNAL- 55 



Sbjct: 56 LAILISSFLFGIIHGPTNFISFGOTFFMGIILSVSYYKTKDLRVSISIHFLNNLFPAIAI 115 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 72/229 (31%), Positives = 114/229 (49%), Gaps = 24/229 (10%) 





11 


Sbjct: 


2 




63 


Sbjct: 






119 


Sbjct: 


118 




172 


Sbjct: 


178 



QAKQDTFIKQQKM- - 



APIMEE++FR +LF K K ++ +L FAL 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 123 

A DNA sequence (GBSx0128) was identified in S.agalactiae <SEQ ID 417> which encodes the amino acid 
sequence <SEQ ID 41 8>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 0826 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC06504 GB:AE000676 pyrroline carboxylate reductase [Aquifex 
aeolicus] 

Identities = 97/259 (37%) , Positives = 159/259 (60%) , Gaps = 4/259 (1%) 

MKIGIIGVGKM--ASAIIQGLKQTQHDIIISGSCLERSKEIAERLDVTYAESHQSLINQA 58 
M++GI+G G M A A+ K + +II++ E+ + +A + + +A + L + + 
MRVGIVGFGNMGQAFALCFSKKLGKENI I VTDKVQEK- RNLATEMGIAFASDVKFLADNS 6 6 



D++++ +KP+ ++VL L K II 3+ AG-8+ H 





1 


Sbjct: 


8 


Query: 


59 


Sbjct: 


67 




118 


Sbjct: 


127 




178 


Sbjct: 


187 



) FTALAGS PA+++ FI+ALA 



AGV GF EQAL I TV+ S++ L + Q + ++LI + SPGGTTI G+ LE+ G 
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Query: 238 LTHSVISAIDATIEKAKKL 256 

+V+ 1+ T +KAKKL 
Sbjct: 247 FKGTVMECINRTSQKAKKL 265 

A related DNA sequence was identified in S.pyogenes <SEQ ID 419> which encodes the amino acid 
sequence <SEQ ID 420>. Analysis of this protein sequence reveals the following: 

> N- terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 1043 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 180/256 (70%) , Positives = 208/256 (80%) 

Query: 1 MKIGIIGVGKMASAIIQGLKQTQHDIIISGSCLERSKEIAERLDVTYAESHQSLINQADI 60 

MKIGI IGVGKMASAI I +GLKQT H++IISGS LERSKEIAE+L + YA SHQ LI+Q D+ 
Sbjct: 1 MKIGIIGVGKMASAIIKGLKQTPHELIISGSSLERSKEIAEQLALPYAMSHQDLIDQVDL 60 

Query: 61 IMLGIKPQLFEKVLLPLDITKPIISMAAGISLARLSQLTRSDLPLIRIMPNINAQILQSC 120 

++LGIKPQLFE VL PL +PIISMAAGISL RL+ DLPL+RIMPN+NAQILQS 
Sbjct: 61 VILGIKPQLFETVLKPLHFKQPIISMAAGISLQRIoATFVGQDLPLLRIMPNMNAQILQSS 120 

Query: 121 TAICYNNHVSDELRQLAKEITDSFGSSFDIAETOFDTFTAIAGSSPAYIYLFIEALAKAG 180 

TA+ N VS EL+ +++TDSFGS+FDI+E +FDTFTAIAGSSPAYIYLFIEALAKAG 
Sbjct: 121 TALTGNALVSQELQARVRDLTDSFGSTFD1SEKDFDTFTALAC-SSPAYIYLFIEAIAKAG 180 

Query: 181 VKYGFPKEQALSIVGQTVLASSQNLLQGQNSTSDLIDNICSPGGTTIAGLLDLEKNGLTH 240 

VK G PK +AL IV QTVLAS+ NL S D ID ICSPGGTTIAGL++LE+ GLT 

Sbjct: 181 VKNGIPKAKALEIVTQTVIiASASNLKTSSQSPHDFIDAICSPGGTTIAGIjMELERLGIjTA 240 

.Query: 241 SVISAIDATIEKAKKL 256 
+V SAID TI+KAK L 
Sbjct: 241 TVSSAIDKTIDKAKSL 256 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 124 

A DNA sequence (GBSx0129) was identified in S.agalactiae <SEQ ID 421> which encodes the amino acid 
sequence <SEQ ID 422>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

»> Seems to have no N-terminal signal sequence 



bacterial cytoplasm --- Certainty=0 . 3405 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:CA&56994 GB:X81089 glutamyl -aminopeptidase [Lactococcus lactis] 
Identities = 219/354 (61%) , Positives = 273/354 (76%) , Gaps = 1/354 (0%) 

55 Query: 3 DLETOIKTVTELDGIAGYBHNIRNFLRQEITPLVDQWDGLGGIFGVKNT 62 

+LF+K+K +TE+ +G+E +R++L+ + h Q E DGLGGIF K + NAP++M 
Sbjct: 2 ELFDKVKALTEIQATSGFEGPVRDYLKARMVELGYQPEFDGLGGIFVTK7ASKVENAPRIM 61 

Query: 63 VAAHMDEVGFMVSHIQPDGTFRVLEVGGWNPLWSSQRFTLYTRSGDAIPVISGSVPPHF 122 
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Sbjct: 


62 


Query: 


123 


Sb j ct : 


122 


Query: 


183 


Sb j ct : 




Query: 


243 


Sbjct: 


242 


Query: 


302 


Sbjct: 


302 



VAAHMDEVGFMVS 1+ DGTFRV+ +GGWNPLWS QRFTL+TR+G IPV++G +PPH 



+P ISDI+FDG F + EA FQIA GD+I+P++ETIL+AN K+I+SKAWD 



SI G+PSTTIGV ARYIHSHQT++ +DDFLQAQ +L+AI+ L+ V IK Y 
NIGGVPSTTIGWARYIHSHOTIFNIDDFLQAOTFLRAIITSEMTEKVAEIKNY 355 

A related DNA sequence was identified in S.pyogenes <SEQ ID 423> which encodes the amino acid 
sequence <SEQ ID 424>. Analysis of this protein sequence reveals the following: 

I-terminal signal sequence 



25 Final Results 

bacterial cytoplasm Certainty=0. 2747 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

30 An alignment of the GAS and GBS proteins is shown below: 

Identities = 276/355 (77%), Positives = 322/355 (89%) 

Query: 1 MSDLFNKIKTVTELDGIAGYEHNIRNFLRQEITPLVDQVETDGLGGIFGVKMTHETNAPK 60 
M+DLF+KIK VTELDGIAGYEH++R++LR +ITPLVD+VETDGLGGIFG++++ AP+ 
35 Sbjct: 1 MTDLFSKIKEVTELDGIAGYEHSVRDYLRTKITPIiVDRVETDGLGGIFGIRDSKAEKAPR 60 

Query: 61 VMVAAHMDEVGFMVSHIQPDGTFRVLEVGGWNPLWSSQRFTLYTRSGDAIPVISGSVPP 120 

++VAAHMDEVGFMVS 1+ DGT RV+ +GGWNPLWSSQRFTLYTR+G IP+ISGSVPP 
Sbjct: 61 ILVAAHMDEVGFMVSDIKVDGTLRWGIGGWNPLWSSQRFTLYTRTGQVIPLISGSVPP 120 

40 

Query: 121 HFLRGQSGGTTLPKISDIVFDGGFTDKNFAESFGIAPGDIIVPKSETILTANQKHIMSKA 180 

HFLRG +G +LP I DIVFDGGFTDK EAE FGI PGDII+P+SETILTANQK+I+SKA 
Sbjct: 121 HFLRGANGSaSLPHIEDIVFDGGFTDKAEAERFGrTPGDIIIPQSETILTANQKNIISKA 180 

45 Query: 181 WDmiYGVLMVTELLKSLKDQSLSNTLIAGJUJVQEEVGLRGAHVSTTKFNPDIFLAVDCSP 240 

WDNRYGVLM+TE+L++LK Q L+NTLIAGAHVQEEVGLRGAHVSTTKF+P++F AVDCSP 
Sbjct: 181 WDIWYGVLMITEMLEM,KGQDLNNTLIAGJiNVQEEVGLRGAHVSTTKFDPELFFAVDCSP 240 

50 QUerY: 

Sbjct: 

Query: 3 01 LKNSGIPSTTIGVCARYIHSHQTLYAMDDFLQAQAYLCAIWKLDRSTVDIIKGY 355 
L+N G+PSTTIGVCARYIHSHQTLYAMDDF++AOA+LQAI+ KLDRSTVD+IK Y 
55 Sbjct: 3 01 LQNGGVPSTTIGVCARYIHSHQTLYAMDDFVFACAFLQAIIKKLDRSTVDLIKCY 355 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 125 

60 A DNA sequence (GBSx0130) was identified in S.agalactiae <SEQ ID 425> which encodes the amino acid 
sequence <SEQ ID 426>. Analysis of this protein sequence reveals the following: 
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Possible site: 26 

»> Seems to have no N-terminal signal sequence 

Final Results 

5 bacterial cytoplasm Certainty=0. 1672 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

10 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 126 

A DNA sequence (GBSx0131) was identified in S.agalactiae <SEQ ID 427> which encodes the amino acid 
15 sequence <SEQ ID 428>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.28 Transmembrane IB - 34 ( 17 - 34) 

20 Final Results 

bacterial membrane Certainty=0 . 1914 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

25 The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 429> which encodes the amino acid 
sequence <SEQ ID 430>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
»> Seems to have an uncleavable N-term signal seq 
30 INTEGRAL Likelihood = -6.16 Transmembrane 12 - 28 ( 8-30) 

Final Results 

bacterial membrane Certainty=0. 3463 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below: 

Identities = 30/91 (32%) , Positives = 48/91 (51%) 

40 

Query: 13 MKNKKILFGTGIAGVGLLAAAGYTLTKKVTDYKRQQITQTLREFFSQMGDIQVFYFNEFE 72 

M KKI +G+ G L G + D +R+Q+T+ LR FFS +G I+V Y N + 

Sbjct: 4 MSKKKIGMISGIFGFSLAIGLGIVIKDYCQDRQRRQMTRDLRTFFSPLGQIEVLYINPCQ 63 

45 Query: 73 SDI KMTSGGLVLEDGRI FEFI YRQG VLDYVE 103 

SGG+V+ +G+ ++F Y + + E 
Sbjct: 64 VKQDYISGGWMSNGKQYQFTYHSRQISFEE 94 

A related GBS gene <SEQ ID 8497> and protein <SEQ ID 8498> were also identified. Analysis of this 
50 protein sequence reveals the following: 

Lipop Possible site: -l Crend: 4 
SRCFLG: 0 

McG: Length of TJR: 21 
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Peak Value of UR: 2.30 

Net Charge of CR: 3 
McG: Discrim Score: 6.28 
GvH: Signal Score (-7.5): -1.46 

Possible site: 19 
»> Seems to have a cleavable N-term signal seq. 
Amino Acid Composition: calculated from 20 
ALOM program count: 0 value: 22.60 threshold: 
PERIPHERAL Likelihood = 22.60 29 
modified ALOM score: -5.02 

*** Reasoning Step: 3 



- Final Results 

bacterial outside - 

bacterial membrane - 

bacterial cytoplasm - 



- Certainty=0. 3000 (Affirmative) . 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty=0 . 0000 (Not Clear) < s 



SEQ ID 8498 (GBS214) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 40 (lane 3; MW 13.9kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 46 (lane 6; MW 39kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 127 

A DNA sequence (GBSx0132) was identified in S.agalactiae <SEQ ID 431 > which encodes the amino acid 
sequence <SEQ ID 432>. This protein is predicted to be thioredoxin HI (trxA). Analysis of this protein 
sequence reveals the following: 

30 Possible site: 40 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2350 (Affirmative) < suco 
35 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 



Query: 14 IDSTKKWFFFTADWCPDCQFiypVMPSIEKDFSDFVFVRVNRDDYIELAQQWNIFGIPS 73 

+ + + WF F+ADWCPDC+ I P +P +E+ + ++ F VNRDD+IEL Q+ +IFGIPS 
Sbjct: 13 VKNQENWFLFSADWCPDCRVIEPFLPELEQTYDSYQFYYVNRDDFIELCQELDIFGIPS 72 

Query: 74 FWVENGQELGRLVNKNRKTKAEITKFIAE 103 

F+ NG+E R V+K+RKTK EI +FL E 
Sbjct: 73 FLFYSNGEERSRFVSKDRKTKEEIERFLTE 102 

A related DNA sequence was identified in S.pyogenes <SEQ ID 433> which encodes the amino acid 
sequence <SEQ ID 434>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0, 1997 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 70/102 (68%) , Positives = 81/102 (78%) 

5 

Query: 1 MILPESYEEIAAYIDSTKKWFFFTADWCPDCQFIYPVMPSIEKDFSDFVFVRVNRDDYI 60 

MI P SYE +A 1+ K+V FFTADWCPDCQFIYP+MP IE + +D FV VNRD +1 
Sbjct: 1 MIRPTSYESIATLIEKEDKLVLFFTADWCPDCQFIYPIMPEIEAELTDMTFVCVNRDQFI 60 

10 Query: 61 ELAQQWNIFGIPSFWVENGQELGRLVNKNRKTKAEITKFLA 102 

E+AQ+WNIFGIPSFW+E GQE+GRLVNK RKTK EI FLA 
Sbjct: 61 EVAQKWNIFGIPSFWIEKGQEVGRLVNKMRKTKT3IMHFLA 102 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
15 vaccines or diagnostics. 

Example 128 

A DNA sequence (GBSx0133) was identified in S.agalactiae <SEQ ID 435> which encodes the amino acid 
sequence <SEQ ID 436>. This protein is predicted to be phenylalanyl-fRNA synthetase beta subunit, non- 
spirochete. Analysis of this protein sequence reveals the following: 

20 Possible site: 47 

»■> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1310 (Affirmative) < suco 

25 bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC00291 GB:AF008220 YtpR [Bacillus subtilis] 
30 Identities = 78/196 (39%), Positives = 125/196 (62%), Gaps = 1/196 (0%) 

Query: 5 YNREHVGDTLMVIVKDSQGAKLDTORRGQVARVYLQDSKETVAWNIFEVSSLIVIEGAGQ 64 

YN+E VGDTL++ ++D +L ++ G V +++ ++KET +NIF SS + 1+ G 
Sbjct: 5 YNKEGVGDTLLISLQDVTREQLGYEKHGDWKIFNNETKETTGFNIFNASSYLTIDENGP 64 

35 

Query: 65 ITLSDQDIKILNAELLKEGFEDSLVNNIEPTFWAQ-KEIIDHPDSDHLHICQAEINDGK 124 

+ LS+ ++ +N L + G E++LV ++ P FW ++ HP++D L +C+ + + + 
Sbjct: 65 VALSETFVQDVNEILNPJfGVEETLVVDLSPKFWGYVESKBKHPNADKLSVCKVNVGE-E 123 

40 Query: 125 TVQIVCGAPNASVGLKTVAALPGAMMPNGSLIFPGKLRGEDSFGMLCSARELALPNAPQV 184 

T+QIVCGAPN G K V A GA+MP+G +1 +LRG S GM+CSA+EL LP+AP 
Sbjct: 124 TLQIVCGAPNVDQGQKVWAKVGAVMPSGLVIKDAELRGVPSSGMICSAKELDLPDAPAE 183 

Query: 185 RGI IELSDQVIVGESF 200 
45 +GI+ L G++F 

Sbjct: 184 KGI LVLEGDYEAGDAF 199 

A related DNA sequence was identified in S.pyogenes <SEQ ID 437> which encodes the amino acid 
sequence <SEQ ID 438>. Analysis of this protein sequence reveals the following: 

50 Possible site: 47 

»> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -1.49 Transmembrane 90 - 106 ( 90 - 107) 

Final Results 

55 bacterial membrane Certainty=0 . 1595 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



WO 02/34771 



-206- 



PCT/GB01/04789 



The protein has homology with the following sequences in the databases: 

>GP:BAB0S970 GB:AP001518 phenylalanyl-tRNA synthetase (beta subunit) 
[Bacillus halodurans] 
Identities = 84/196 (42%) , Positives = 124/196 (62%) , Gaps = 1/196 (0%) 

Query: 5 YNKEQVGDVLMVILQDTKDIKRQ\ r ERKGKl'ARVFAEESGKTIiA.WKIFEASSLITIEGNGQ 64 



Query: 65 IFLTDENLARMAELAKEGFSERLEPIVGPVFWGQIVEMVAHPDSDHLNICQVAIGKDQ 124 

+ LTD +A L K G + LE + P FWG + HP++D L+IC+V +G D 

Sbjct: 65 LELTDSLVATLEQAFQKHGVNWTI.EVDLSPKFWGFVQSKDKHPNADKLSICKVDVGSD- 123 

Query: 125 TVQIVAGAPNAALGLKTIVALPGAIMPNGSLIFPGKLRGEESYGMMCSPRELALPNAPQK 184 

T+QIV GAPN G K +VAL GA+MP+G +1 P LRG S GM+CS +ELALP+AP++ 
Sbjct: 124 TLQIVCGAPNVEAGQKWVALEGAVMPSGLVIKPTSLRGVSSTGMICSAKELALPDAPEE 183 

Query: 185 RGI IEFDESAWGEAF 200 

+GI+ D+S VG +F 
Sbjct: 184 KGILVLDDSYEVGTSF 199 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 133/207 (64%), Positives = 167/207 (80%) 

Query: 1 MIFTYNREHVGDTLMVIVIGDSQGAKLDvDRRGQVARVYLQDSKETVAWNIFEVSSLIVIE 60 

MIF YN+E VGD LMVI++D++ K V+R+G+VARV+ ++S +T+AWNIFE SSL1 IE 
Sbjct: 1 MIFAYNKEQVGDVLMVILQDTKDIKRQVERKGKVARVFAEESGKTLAWWIFEASSLITIE 60 

Query: 61 GAGQITLSDQDIKILNAELLKEGFEDSLVNNIEPTFWAQIKEIIDHPDSDHLHICQAEI 120 

G GQI L+D+++ LNAEL KEGF + L + P FW QI E++ HPDSDHL+ICQ I 
Sbjct: 61 GNGQIFLTDENLARLNAELAKEGFSERLEPIVGPVFWGQIVEMVAHPDSDHLNICQVAI 120 

Query: 121 KTOGKTVQIVCGAPKASVGLKTVAALPGAMMPNGSLIFPGKLRGEDSFGMLCSARELALPN 180 

+ +TVQIV GAPNA++GLKT+ ALPGA+MPNGSLIFPGKLRGE+S+GM+CS RELALPN 
Sbjct: 121. GEDQTVQIVAGAPNAALGLKTIVALPGAIMPNGSLIFPGKLRGEESYGMMCSPRELALPN 180 

Query: 181 APQVRGIIELSDQVIVGESFDANKHWK 207 

APQ RGI IE + +VGE+FD KHWK 
Sbjct: 181 APQKRGI IEFDESAWGEAFDPAKHWK 207 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 129 

A DNA sequence (GBSx0135) was identified in S.agalactiae <SEQ ID 439> which encodes the amino acid 
sequence <SEQ ID 440>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3052 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB81904 GB:D92974 unknown [Lactococcus lactis] 
Identities = 69/241 (28%), Positives = 117/241 (47%), Gaps = 15/241 (6%) 

Query: 7 YKEMIAKPWGKIQYEITFAQL--SHIKNQNVIiDFGAGFCIjTEQHLAKEN-NVTAIEPNPK 63 

Y E+ KPWG++ Y++ F QL + K+ +L FG+GF TE L ++ VT EP+ + 
Sbjct: 23 YAEVFEKPWGRMFYDLLFPQLLPNLTKDSKILSFGSGFGRTETFLEEQGFEVTGYEPDVE 82 



WO 02/34771 



-207- 



PCT/GB01/04789 



Query: 64 LLYDNQSDNIYKILGSYEALRD - LPDQSFBTI I CHNVLEYIDKHNHPAYFDEFSRLLKPN 122 

L ++ G+++ + + ++ +D 1+ HNVLEY4 + + Lh 

Sbjct: 83 KLEMMSDQTFRQLTGTFDDFAETVKNERYDVIIiIHNVLEYV- - LDRKWLELLLSLLTDG 140 

Query: 123 GELSLIKHNITGKILQSVIFSNDTSTAMBLLTGESNFKSASFDQGNIYT LEELKQ 177 

G LS++KH+ G +++ ++■ A+++ EA AS + G+I L + 

Sbjct: 141 GTLSIVKHSKYGSMIEMAAGRDNPQAALDVYENEA VASHNHGDILVYDDDWLTDFVA 197 

Query: 178 NTOLLVERYQGIRTFYSLQPN-HFKTE7GWLNKMLA1ELSVADKAPYKDIAFLQHITLKKS 237 

N L ++ GIR FY + N K W ML +E VA +A L H+ KKS 

Sbjct: 198 NYKLKIQEKFGIRHFYGISQNAEIKETENWYQPMLKLEQKVAKDQTLYPVARLHHIjIFKKS 258 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

Example 130 

A DNA sequence (GBSx0136) was identified in S.agalactiae <SEQ ID 441> which encodes the amino acid 
sequence <SEQ ID 442>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3479 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco . 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF74079 GB:AF212845 putative single stranded binding protein 
[Lactococcus lactis bacteriophage ul36] 
Identities = 64/141 (45%) , Positives = 92/141 (64%) , Gaps = 10/141 (7%) 

Query: 1 MYNKVlMIGRLTAKPEWKTPTDKSVTRATVAVNRRFKGSNGEREADFINVvMWGRLAET 60 

M N V ++GR+T +PE+ TP +K+V T+AVNK FK +NGEREADFI+ V+WG+ AE 
Sbjct: 1 MINNVTLVGRITKEPELRYTPQNKAVATFTI^VNRAFKNANGEREADFISCVIWGKSAEN SO 

Query: 61 LASYGTKGSLISIDGELRTRKYE-KDGQTHYITEVLASSFQLLESRAQ RAM 110 

LA++ KG LI + G ++TR YE + GQ YITEV+AS+FQ+LE Q + 
Sbjct: 61 LANWTHKGQLIGVIGNIQTRNYENQQGQRVYITEWASNFQVLEKSNQANGERISNPASK 120 

Query: 111 RENNVSGDLSDLVLEEEELPF 131 

+NN S + + +++LPF 

Sbjct: 121 PQNNDSFGSDPMEI SDDDLPF 141 

A related DNA sequence was identified in S.pyogenes <SEQ ID 443> which encodes the amino acid 
sequence <SEQ ID 444>. Analysis of this protein sequence reveals the following: 

:> N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1817 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=o . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 102/131 (77%) , Positives = 116/131 (87%) 
Query: 1 MYNKVIMIGRLTAKPEMVKTPTDKSVTRATVAVNRRFK^ 60 
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Query: 61 LASyGTKGSLISIDGELRTRKYEKDGQTHYITE\1ASSFQLLESRAQRAMRENNVSGDLS 120 

L SY +KGSL+SIDGELRTRKY+KDGQ HY+TEVL SFQLLESRAQRAMRENNV+ DL 
Sbjct: 61 LVSYASKGSLMSIDGELRTRKyDKDGQVHYVTEVLCQSFQLLESRAQRAMRENNVTNDLV 120 

Query: 121 DLVLEEEELPF 131 

DLVLEE+ LPF 
Sbjct: 121 DLVLEEDTLPF 131 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 131 

A DNA sequence (GBSx0137) was identified in S.agalactiae <SEQ ID 445> which encodes the amino acid 
sequence <SEQ ID 446>. Analysis of this protein sequence reveals the following: 

o N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2235 (Affirmative) < suco 

bacterial membrane Certainty=D. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9493> which encodes amino acid sequence <SEQ ID 9494> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAC13072 GB:AL445503 putative hydrolase [Streptomyces 
coelicolor] 

Identities = 63/179 (35%) , Positives = 91/179 (50%) , Gaps = 2/179 (1%) 

Query: 33 IIFDMDGVIVDSEYTFLDNKTEMLREEGI-DTDVSYQYQYMGTTFEFMWQAMKEEFGLPK 91 
+IFD+DG +VDSE + + L E G+ D + Y+G + + K +GL 

■ Sbjct: 12 VIFDLDGTLVDSEPHYYEAGRRTIAEYGVPDFSWADHEAYVGISTQETVADWKRRYGLRA 71 

Query: 92 TVKEYIAEMNRRRQAI VARDGVRP I KGAQRL I HVJLHQHGYRLAVASSSPMVDI KRNLKEL 151 

TV+E +ANR +ARR +++LG +AVAS S I L 

Sbjct: 72 TVEELLAVKNRHYLGL-ARTSARAYPEMRKFVELLAGEGVPMAVASGSSPEAIAAILART 130 

Query: 152 GVTECFEYMVTGEDVSSSKPAPDVFLRAAELLDVDPKVCIVIEDTRNGSLAAKAAGMYC 210 

G+ +V+ 4-+V+ KPAPDVFL AA L +P C+V+ED G+ AA AAGM C 

Sbjct: 131 GLDAHLRTWSADEVARGKPAPDVFLEAARRLGTEPARCVVLEDAAPGAAAAHAAG^C 189 

A related DNA sequence was identified in S.pyogenes <SEQ ID 447> which encodes the amino acid 
sequence <SEQ ID 448>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3706 (Affirmative) < suco 

bacterial membrane Certainty^O. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 62/202 (30%) , Positives = 100/202 (48%) , Gaps = 1/202 (0%) 
Query: 29 MEKVIIFDMDGVIVDSEYTFLDNKTEMLREEGIDTDVSYQYQYMGTTFEFMWQAMKEEFG 88 
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M K IIFDMDGV+ D+E +L + + + +GI D ++G + +W+ + + 

Sbjct: 3 MIKGIIFDMDGVLFDTEPFYLRRREDFFKTKGIPIDHLMSKDFIGGNLQELWKELLGKNR 62 

Query: 89 LPKTVKEYI AEMNRRRQAI VARDGVRP I KGAQRL I HWLHQHGYRLA VAS S SPMVDI KRNL 148 
5 VK + + +QA I + L + G +LAVAS+S D+ h 

Sbjct: 63 DDAIVKAITTDYDAYKQAHKPPYQKLLITEVNSCLEQLEKCGIKLAVASNSKRQDVLIAL 122 

Query: 149 KELGVTECFEYMVTGEDVSSSKPAPDVFLRAAEIiLDVDPKVCIVIEDTRNGSLAAKAAGM 208 
+ + + FE ++ EDVS KP PD++ +A + L + K +V+ED++ G AAKAA + 
10 Sbjct: 123 ETTQIKDYFEIILAREDVSRGKPYPDIYNKAVQKIXSIiQKKQLLVVEDSQKGIAAAKAANL 182 

Query: 209 YCFGFANPDYPPQDLSMADKVI 230 

F + Y D S AD I 
Sbjct: 183 TVFAITDYRY-GIDQSQADHKI 203 

15 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 132 

A DNA sequence (GBSx0138) was identified in S.agalactiae <SEQ ID 449> which encodes the amino acid 
20 sequence <SEQ ID 450>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.22 Transmembrane 16 - 32 ( 16 - 32) 

25 Final Results 

bacterial membrane Certainty-0. 1086 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

30 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 133 

35 A DNA sequence (GBSx0139) was identified in S.agalactiae <SEQ ID 451> which encodes the amino acid 
sequence <SEQ ID 452>. Analysis of this protein sequence reveals the following: 
Possible site: 34 

»> Seems to have an uncleavable N-term signal seg 

INTEGRAL Likelihood = -5.04 Transmembrane 28 - 44 ( 27 - 45) 

40 

Final Results 

bacterial membrane Certainty=0. 3017 (Affirmative) < suco. 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

45 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 134 

A DNA sequence (GBSx0140) was identified in S.agalactiae <SEQ ID 453> which encodes the amino acid 
sequence <SEQ ID 454>. Analysis of this protein sequence reveals the following: 



Possible site: 17 



10 



Seems to 


have an uncleavable N- 


term signal seq 










INTEGRAL 


Likelihood =-10 


72 


Transmembrane 


38 


54 


34 
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INTEGRAL 


Likelihood = -7 


70 


Transmembrane 




20 


1 


22 


INTEGRAL 


Likelihood = -4 


99 


Transmembrane 
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INTEGRAL 


Likelihood = -2 


55 


Transmembrane 


179 
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INTEGRAL 


Likelihood = -2 


39 


Transmembrane 


93 


109 


93 


109 


INTEGRAL 


Likelihood = -1 




Transmembrane 


116 


132 


116 


133 


INTEGRAL 


Likelihood = -0 


43 


Transmembrane 


344 


360 


344 


360 



Pinal Results 

15 bacterial membrane Certainty=0. 5288 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

20 >GP:CAB14853 GB:299118 two-component sensor histidine kinase 

[Bacillus subtilis] 
Identities = 254/585 (43%), Positives = 371/585 (63%), Gaps = 9/585 (1%) 



Sbjct: 4 LMIMMLERVGIIVILGFIl^TKLFRC^QNQDGYKGKAILISIFSLFSIISireTGIEIQ 63 

Query: 61 GDRSLVERPFLTTISHSDSLANTRTLVITTASLVGGPLVGSIVGFIGGVHRFFQGSFSGS 120 

+ +V ++ TI S S+ANTR L + L+GGP VG+ +G + G+HRF G + 
Sbjct: 64 RISIM-IVNTOWFTIDPSGSIANTRILGVEIGGLLGGPFVOAGIGILAGLHRFSLGGSTAL 122 

Query: 121 FYIVSSVLVGIVSGKIGDKLKENHLYPSTSQVILISIIAESIQMLFVGIFT GWEL 175 

VSS+L G+++G IG + + P+ L+ I ES+QM+ + + WEL 

Sbjct: 123 SCAVSSILAGVLAGLIGRYFTKRYRMPTPRIAALVGIGMESLQMIIILLMAKPFSDAWEL 182 

Query: 176 VKMIVIPMMILNSLGSTLFLAILKTYLSNESQLRAVQTRDVLELTRQTLPYLRQGLTPQS 235 

V MI IPM+++N GS +FL+I++ + E Q RA++T VL + QTLP+ RQGL S 
Sbjct: 183 VSMIGIPMILINGTGSFIFLSIIQAIIRKEEQARALETHRVLTIADQTLPFFRQGLNENS 242 

Query: 236 ARSVCEIIKRHTNFDAVGLTDRSNVLAHIGVGHDHHIAGQPVKTDLSKSVIFDGEPRIAQ 295 
+SV II + T DAV LTD+ +LAH+G G DHHI + + T LSK VI G A 

Query: 296 DKAAISCPDHNCQLNSAIWPLKINDKTVGALKMYFAGDKTMSEVEENLVLGLAQIFSGQ 355 

+ I C C L++AIV+PL N T+G LKMYF +S+VEE L GLA +FS Q 

Sbjct: 303 SQEEIECTHAECPLHAAIVLPLTSNGNTIGTLKMYFKSPAGLSQVEEELAEGLAMLFSTQ 362 

Query: 356 LAMGITEEQNKIASMAEIKALQAQINPHFFFNAINTISALIRIDSDKARYALMQLSTFFR 415 

L +G E Q+KL AEIKALQAQ+NPHF FNAINTISAL R D +K R L+QLS +FR 
Sbjct: 363 LELGEAELQSKLLKDAEIKALQAQTOPHFLFNAINTISALCRTDVEKTRKIiLLQLSVYFR 422 



Query: 416 TSLQGGQDREVTLEQEKSHVDAYMNVZKLRFPDKYQLSYDI-SAPEKMKLPPFGLQVLVE 474 

++LQG + + L +E +H++AY+++E+ RFP KY++ +1 S E++++PPF LQVLVE 
Sbjct: 423 SNLQGARQLLIPLSKELNHLNAYLSLEQARFPGKYKIELNIDSRLEQIEIPPFVLQVLVE 482 

Query: 475 NAVRHAFKERKTD1SIHILVQIKPDGHYYCVSVSDNGCGISDTIIDKLGQETVAESKGTGTA 534 

NA+RHAF +++ + V + D + V+DNG+GI ++ +LG++ +GTGTA 

Sbjct: 483 NALRHAFPKKQDICKVTVCVLSDDASVYMKVADNGRGIPPDVLPELGKKPFPSKEGTGTA 542 

Query: 535 LVNLNNRLNLLYGSVSCLHFSSD-KNGTKVWYRIPNRIREDEHEN 578 

L NLN RL L+G + LH SS+ CT+V +++P + ++ E+ 
Sbjct: 543 LYNLNQRLIGLFGQQAALHISSEVHKGTEVSFQVPMQQMKEGEEH 587 
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A related DNA sequence was identified in S. pyogenes <SEQ ID 45 5> which encodes the amino acid 
sequence <SEQ ID 456>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>» Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 1771 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities » 75/245 (30%) , Positives = 117/245 <47%) , Gaps = 22/245 (8%) 

LAQIFSGQL AMGITEEQNKLASMAEIKALQAQINPHFFFNAINTISALIRI-DSD 401 

LAQ F+ L M ++ K ++AL +QINPHF +N ++TI + DS 

LAQQFNALLDQIDSLMVAVADKEKAIGQYRLQALASQINPHFLYNTLDTI IWMAEFNDSK S3 





348 


Sb j ct : 






402 


Sbjct: 


S4 


Query: 


461 


Sbjct: 


121 




519 


Sbjct: 








Sbjct: 


236 



- L E HV Y+ ++K R+ DK LSY++ 



KLGQETVAESKGTGTALVNLNNRIiNLLYGS - - VSCLHFSSDKNGTKVWYRIPNR IRE 573 

Q +A G L N++ RL L YG +H SD+ T++ +P + + 

N-SQSLLARG- - - GVGLKNVDQRLKLHYGEGYHMT1HSQSDQ- FTEI QLSLPKMHELMAD 235 

DEHEN 578 



SEQ ID 454 (GBS248d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 124 (lane 2-4; MW 71kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 124 (lane 5-7; MW 46kDa) and in 
Figure 180 (lane 2; MW 46kDa). 

GBS248d-His was purified as shown in Figure 234, lane 3-4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 135 

A DNA sequence (GBSx0141) was identified in S.agalactiae <SEQ ID 457> which encodes the amino acid 
sequence <SEQ ID 458>. This protein is predicted to be two-component response regulator (lytT). Analysis 
of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3230 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9495> which encodes amino acid sequence <SEQ ID 9496> 
was also identified. 
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The protein has homology with the following sequences in the GENPEPT database: 



MKILILDDEMFARQELSFLVEHSQETONPEIFQAEDISEAEKILFRQQIDLIFLDISLSE 62 
+++LI+DDEM AR EL++L++ + D EI +AE+I A + Q+ DL+FLD+ LS 
LRVLIVDDEMIARDEIAYLLKRTN- - DEME INEAEN I E SAFDQMMDQKPDLLFLDVDLSG 59 



f- HPP +VFATAYD YA+KAFE +A+DY+ KPF+++R+ L K KK+ 



Query: 


3 


Sbjct: 


2 


Query: 


63 


Sbjct: 


60 




123 


Sbjct: 


120 




103 


Sbjct: 


176 




243 


Sbjct: 


236 



F+++HR+ V+N E I+3IQPWFN T LIM +G K PV R+Y K+L 



A related DNA sequence was identified in S. pyogenes <SEQ ID 45 9> which encodes the amino acid 
sequence <SEQ ID 460>. Analysis of this protein sequence reveals the following: 

D N-terminal signal 



Final Results 

bacterial cytoplasm — Certainty=0 .3818 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 44/148 (29%) , Positives = 84/148 (56%) , Gaps = 5/148 (3%) 

Query: 5 ILILDDEMFARQELSFLVEHSQ-EVDNPEIFQAEDISEAEKILFRQQIDLIFLDISLSEE 63 

+LI++DE RQ + LV+ SQ ++D + +AE+ A + ++ D++ DI++ + 
Sbjct: 4 LLIVE;DEYLWC53IRSLvDFSQFKIDR--vNEAENGQLAWDLFQI<EPYDIVLTDINMPKL 61 

Query: 64 NGFTLANQLSQLAHPPLiWFATAYD - -NYAVKAFESNA VBYIMKPFEQQRVDMALjSKVKK 121 

NG LA + Q + +VF T YD NYA+ A + A DY++KPF + V+ L K++K 
Sbjct: 62 NGIQLRELIKQESPQTHLVFLTGYDDFITYALSALKLGADDYLLKPFSKADVEDMLGKLRK 121 

Query: 122 LSQLTTASDVEQAI PKKASVELLTLTLS 149 

Sbjct: 122 KLELSKKTETIQELVEQPQKEVSAIAMA 149 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 136 

A DNA sequence (GBSx0142) was identified in S.agalactiae <SEQ ID 461> which encodes the amino acid 
sequence <SEQ ID 462>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0 . 0266 (Affirmative) < suco 
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- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

5 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 137 

A DNA sequence (GBSx0143) was identified in S.agalactiae <SEQ ID 463> which encodes the amino acid 
10 sequence <SEQ ID 464>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-11.89 Transmembrane 104 - 120 ( 99 - 134) 

INTEGRAL Likelihood = -5.89 Transmembrane 47 - 63 ( 46 - 65) 

15 INTEGRAL Likelihood = -3.29 Transmembrane 22 - 38 ( 21 - 39) 

INTEGRAL Likelihood = -2.81 Transmembrane 74 - 90 ( 70 - 92) 

Final Results 

bacterial membrane Certainty=0 . 5755 (Affirmative) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8499> which encodes amino acid sequence <SEQ ID 8500> 
was also identified. 

25 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14851 GB.-Z99118 similar to hypothetical proteins from B. subtilis [Bacillus 
subtilis] 

Identities = 50/110 (45%) , Positives = 82/110 (74%) , Gaps = 2/110 (1%) 

30 Query: 20 QMSIYAAILLVSQMISMLLPKSLPIPTTVIGLVLMYVLLTAKIIIWEWVDSFGALMISMI 79 

Q I+A I+LVS MI+ ++P +PIP +V+GLVL+++LL K+IK+E V++ G + S+I 
Sbjct: 12 QAFIFAVIMLVSI#1IAAIVP--IPIPASWGLVLLFLLLCLKVTKLEQVETLGTSLTSLI 69 

Query: 80 GFMEVPSGISVAANLDILKAEGLQLVAVITISTVVMLVWAYVARLILAI 129 
35 GF+FVPSGISV +L +++ GLQ+V VI H-+T+++L ++LIL++ 

Sbjct: 70 GFLFVPSGISVM^ISLGVMQQYGLQIVLVILLATIILLGATGLFSQLILSL 119 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
40 vaccines or diagnostics. 

Example 138 

A DNA sequence (GBSx0144) was identified in S.agalactiae <SEQ ID 465> which encodes the amino acid 
sequence <SEQ ID 466>. Analysis of this protein sequence reveals the following: 

Possible site: 44 
45 »> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-12.21 Transmembrane 219 - 235 ( 208 - 241) 

INTEGRAL Likelihood =-11.94 Transmembrane 103 - 119 ( 99 - 133) 

INTEGRAL Likelihood = -5.57 Transmembrane 157 - 173 ( 154 - 175) 

INTEGRAL Likelihood = -1.70 Transmembrane 73 - 89 ( 73 - 89) 



Final Results 
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bacterial membrane Certainty=0. 5883 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14850 GB:Z99118 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 120/240 (50%) , Positives = 159/240 (66%) , Gaps = 10/240 (4%) 

MELLKTPIPGICFSLILYTIGEHLFICKSKGFFLLQPIiFFAMVSGIVILWLMSKGLGTDVK SO 
ME +P FGI Sh + IG LFKK+KGFFL PLF AMV GI L + 
MESTMSPYFGIWSLAAFGIGTFLFKKTKGFFLFTPLFVAMVLGIAFL KIG 51 

TFYTOAYKPGGDLIFWFLNPATIAFAVPLYKKNDWKKYWVEILSSLVIGMIVSLILIVA 120 

F Y GG++I +FL PATIAFA+PLYK+ D +KKYW +I++S++ G I S+ ++ 
GFSYADYNNGGEIIKFFLEPATIAFAIPLYKQRDKLKKYWWQIMASIIAGSICSVTIVYL 111 

ISKMVGLSQVGIASMLPQAATTAIALPITAAIGGNTAVTAMAC1LNAVI IYALGKKLVSF 180 
++K + L + SMLPQAATTAIALP++ IGG + +TA A 1 NAVI+YALG + 
IAKGIHLDSAVMKSMLPQAATTAIALPLSKGIGGISDITAFAVIFNAVIVYALGALFLKV 171 

FHLNDSKIGAGLGLGTSGHTVGAA.FALELGELQGAMAA.IAVW1GLVVDLVIPIFSHLIG 240 
F + +1 GL LGTSGH +G A +E+GE++ AMA+IAWV+G+V LVIP+F LIG 
FKVK-NPISKGIALGTSGHALGVAVGIEMGEVFAAt^SIAVWVGVVTVLVIPVFVQLIG 230 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 139 

A DNA sequence (GBSx0145) was identified in S.agalactiae <SEQ ID 467> which encodes the amino acid 
30 sequence <SEQ ID 468>. Analysis of this protein sequence reveals the following: 



■ Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < s 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < s 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < £ 



Sbjct: 


1 
1 




61 


Sbjct: 






121 


Sbjct: 


112 




181 


Sbjct: 


172 



An alignment of the GAS and GBS proteins is shown below: 
Identities = 508/542 (93%) , Positives = 523/542 (95%) 

MTKYLKYISFVALFLASIFIiVACQNQNSCTKERTRKQRPKDELWSMGAKLPHEFDPICDR SO 
++KYLKY S + LFL + LVACQ Q QTKER RKQRPKDELWSMGAKLPHEFDPKDR 
VSKYLKYFSI ITLFMGLIIjVACQQQKPQTKERQRKQRPKDELWSMGAKLPHEFDPKDR 62 

YGIHNEGNITHSTLLKRSPELDIKGEIAKKYKISKDGLTWSFDIiNDDFKFSNGEPVTADD 120 
YG+HNEGNITHSTLLKRSPELDIKGEIAK Y +S+DGLTWSFDL+DDFKFSNGEPVTADD 





1 


Sbjct: 


3 




61 


Sbj ct: 


63 


Query: 


121 


Sbjct: 


123 




181 


Sbjct: 


183 




241 



VKFTYDMLKADGKAViDLTFIK3^73WGKNQWIHLTEMSTFTAQLTEIPIvPKKHYNDK 
3 VKFTYDMLKADGKAWDLTFIK2>KffiWGKKQVNIHLTK^SrFTAQLTEIPIVPKIQTYNDK 182 

Query: 181 YKSNPIGSGPYMVKEYKAGEQAIFVRNPYWHGKKPYFKKmTOjLDENTALAAL 240 



MIYATPELA KKVKGTRLLDI SNDVRGLSLPYVKKGV+ +SPDGYPVGNDVTSDPAIRK 
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Sb j Ct : 








Query 


301 


ALTIGLNRQKVLDTVLNGYGKPAYS I IDRTPFWNPKTAIKDNKVAKAKQLLTKAGWKEQA 360 








ALTIGLNRQKVLDTVLNGYGKPAYSIID+TPFt^NPKTAIKDNKVAKAKQLLTKAGWKEQA 


5 


Sbjct: 


303 


ALTIGI^QKVLDTVMGYGKPAYSIIDKTPFWNPKTAIKDNKVAKAKQLLTKAGWKEQA 362 




Query: 


361 


DGSRKKG^KSEFDLYYPTNDQLRANIAVEVAEQAKALGITIKLKASNWDEMATKSHDSA 420 








DGSRKKG+L + FDLYYPTNDQIiRfiinAVWAEGAKftLGITIKLKASNWDEMATKSHDSA 




Sb j ct : 


363 


DGSRKKGDLDAAFDLYYPTNDQLRANIAVIWAEQAKAIfilTIKLKASNWDEMATKSHDSA 422 


10 








uery. 


421 


LLYAGGRHHAQQFYESHYPSIAGKGWTNITFYl^PTVTKYLDKAMTSPDLDKANKYWEGjA 480 








LLYAGGRHHfiQQFYESH+PSIiASKGWTNITFYNNPTVTKYLDKAMTS DLDKAM+YWKXA 




Sbj ct : 


423 


LLYAGGRHHAQQFYESHHPSIAGKGWTNITFYNNPTVTKYLDKAMTSSDLDKANEYWKLA 482 


15 


Query: 


481 


QWDGKTGASTLGDLPNWLVSUfflTYIGDKRINVGKQGVHSHGHDWSLLTNIAEWTWDES 540 




Sbjct: 


483 


QWDGKTGASTLGDLPWl^LVSLNHTYIGDKRINVGKQGVHSHGHDWSLLTNIAEWTWDES 
QTOGKTGASTLGDLPNVWLVSLNHTYIGDKRINVGKQGVHSHGHDWSLLTNIAEVJTWDES 542 






541 


AK 542 


20 






K 




Sbj ct : 


543 


TK 544 



There is also homology to SEQ ID 60. 

A related GBS gene <SEQ ID 8501> and protein <SEQ ID 8502> were also identified. Analysis of this 
25 protein sequence reveals the following: 

Lipop: Possible site: 22 Crend: 5 
McG: Discrim Score: 10.46 
GvH: Signal Score (-7.5): -1.29 
Possible site: 22 
30 >>> May be a lipoprotein 

ALOM program count: 0 value: 7.27 threshold: 0.0 
PERIPHERAL Likelihood = 7.27 386 
modified ALOM score: -1.95 

35 *** Reasoning Step: 3 

Final Results 

bacterial membrane — - Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 8502 (GBS106) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 18 (lane 3; MW 61kDa). 

The GBS106-His fusion product was purified (Figure 194, lane 2) and used to immunise mice. The 
45 resulting antiserum was used for Western blot (Figure 255A), FACS (Figure 255B), and in the in vivo 
passive protection assay (Table III). These tests confirm that the protein is immunoaccessible on GBS 
bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

50 Example 140 

A DNA sequence (GBSx0146) was identified in S.agalactiae <SEQ ID 469> which encodes the amino acid 
sequence <SEQ ID 470>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

»> Seems to have no N-terminal signal sequence 

55 
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Final Results 

bacterial cytoplasm Certainty=0 .4862 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certair.ty=0 . 0000 (Not Clear) < suco 

5 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

10 Example 141 

A DNA sequence (GBSx0147) was identified in S.agcdactiae <SEQ ID 471> which encodes the amino acid 
sequence <SEQ ID 472>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>>> Seems to have no N-terminal signal sequence 

15 INTEGRAL Likelihood = -7.27 Transmembrane 252 - 268 ( 249 - 275) 

INTEGRAL Likelihood = -5.73 Transmembrane 67 - 83 ( 62 - 90) 

INTEGRAL Likelihood = -5.26 Transmembrane 107 - 123 ( 104 - 134) 

INTEGRAL Likelihood = -3.77 Transmembrane 153 - 169 ( 152 - 170) 

20 Final Results 

bacterial membrane Certainty=0 . 3909 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

25 A related GBS nucleic acid sequence <SEQ ID 9295> which encodes amino acid sequence <SEQ ID 9296> 
was also identified. 

The protein differs from U78968 at the N-terminus: 

Query: 1 MASVNYDTSLTPVQYKAIAHHYGLDKPAPVQYFIWLKNFIQGHLGTSLVYRQPVIDIIRS 60 
MASVNYDTSLTP QYKAIAHHYGLDKPA VQYFIWLKN IQG LGTSLVYRQPV DIIRS 
30 Sbjct: 39 MASVNYDTSLTPAQYKAIAHHYGLDKPALVQYFIWLKNVIQGDLGTSLVYRQPVSDIIRS 98 

There is also homology to SEQ ID 64. 

A related GBS gene <SEQ ID 847 1> and protein <SEQ ID 8472> were also identified. Analysis of this 
protein sequence reveals the following: 

35 Lipop: Possible site: -1 Crend: 10 

McG: Discrim Score: 3.72 
GvH: Signal Score (-7.5): -5.37 

Possible site: 40 
»> Seems to have an uncleavable N-term signal seq 
40 , ALOM program count: 5 value: -7.27 threshold: 0.0 

INTEGRAL Likelihood = -7.27 Transmembrane 290 - 306 ( 287 - 313) 
INTEGRAL Likelihood = -5.89 Transmembrane 12 - 28 ( 11 - 33) 
INTEGRAL Likelihood = -5.73 Transmembrane 105 - 121 ( 100 - 128) 
INTEGRAL Likelihood = -5.26 Transmembrane 145 - 161 ( 142 - 172) 
45 INTEGRAL Likelihood = -3.77 Transmembrane 191 - 207 ( 190 - 208) 

PERIPHERAL Likelihood = 2.97 245 
modified ALOM score: 1.95 

*** Reasoning Step: 3 



■ Final Results 

bacterial membrane Certainty=0. 3909 (Affirmative) • 

bacterial outside Certainty=0. 0000 (Not Clear) < I 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < s 
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SEQ ID 8472 (GBS436) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 173 (lane g ; MW 54kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
5 vaccines or diagnostics. 

Example 142 

A DNA sequence (GBSx0148) was identified in S.agalactiae <SEQ ID 473> which encodes the amino acid 
sequence <SEQ ID 474>. This protein is predicted to be transmembrane transport protein DppC (oppC). 
Analysis of this protein sequence reveals the following: 

10 Possible site: 39 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -8.28 Transmembrane 77 - 93 ( 68 - 101) 

INTEGRAL Likelihood = -7.80 Transmembrane 182 - 198 { 180 - 204) 

INTEGRAL Likelihood = -7.06 Transmembrane 112 - 128 ( 104 - 132) 

15 INTEGRAL Likelihood = -5.10 Transmembrane 239 - 255 ( 235 - 25B) 

Final Results 

bacterial membrane Certainty=0 .4312 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

There is homology to SEQ ID 68. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 143 

25 A DNA sequence (GBSx0149) was identified in S.agalactiae <SEQ ID 475> which encodes the amino acid 
sequence <SEQ ID 476>. This protein is predicted to be ATPase protein DppD. Analysis of this protein 
sequence reveals the following: 

Possible site: 59 

>>> Seems to have no N-terminal signal sequence 

30 

Final Results 

bacterial cytoplasm Certainty=0 . 1957 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 . 

The protein differs from U78968 at the C-terminus: 

Query: 241 QTEFARSLWRSLPQQEFLKGVTHDLRG 267 

QTEFAR LWR+LPQQ+FLKGVTHDLRG 
Sbjct: 241 QTEFARRLWRTLPQQDFLKGVTHDLRG 267 

40 

A related DNA sequence was identified in S.pyogenes <SEQ ID 477> which encodes the amino acid 
sequence <SEQ ID 478>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>» Seems to have no N-terminal signal sequence 

45 

Final Results 

bacterial cytoplasm Certainty=0. 1957 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 
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An alignment of the GAS and GBS proteins is shown below: 

Identities = 255/267 (95%) , Positives = 262/267 (97%) 

Query: 1 MTETIitiSIKDLSITFTQYGRFLKPFQSTPIQALNLEIKKGELIAIIGASGSGKSLIjAHai 60 

MTETLLSIKDLSITFTQYGRFLKPFQSTPIQMjNLB+KKGELIAIIGASGSGKSLIAHAI 
Sbjct: 1 MTETLIjSIKDLSITFTQYGRFLKPFQSTPIQALNLEVKKGELLAIIGASGSGKSLLAHAI 60 

Query: 61 MDILPKNaSVTGDMIYRGQSLNSKRIKQLRGKDITLIPQSVNYLDPSTKVKHQVRLGISE 120 

MDILPKNA+VTGDMIYRGQSL SKRIKQLRGK++TLIPQSVNYLDPS KVKHQVRLGISE 
Sbjct: 61 MDILPKNAAVTGDMIYRGQSLTSKRIKQLRGKEKTLIPQSVNYLDPSMKVKHQVRLGISE 120 

Query: 121 NSKATQEGLFQQFGLKESDGDLYPFQLSGGMLRRVLFTTCISDKVSLIIADEPTPGLHPD 180 

N+KATQEGLFQQFGLKESDGDLYPFQLSGGMLRRVLFTTCISD VSLI 1ADEPTPGLHPD 
Sbjct: 121 MAKATQEGLFQQFGLKESDGDLYPFQLSGGMLRRVLFTTCISDTVSLHADEPTPGLHPD 180 

Query: 181 ALQMVLDQLRSFADKGISVIFITHDIVARSQIMRITIFKEGKA.IETAPASFFSGNGEQL 240 

ALQMVLDQLRSFADKGISVIFITHDIVAASQIADRITIFKEGKAIETAPASFFSG GEQL 
Sbjct: 181 ALQMVLDQLRSFADKGISVIFITHDIVAASQIADRITIFKEGKAIETAPASFFSGGGEQL 240 

Query: 241 QTEFARSLWRSLPQQEFLKGVTHDLRG 267 

QTEFAR LWR+LPQQ+FLKGVTHDLRG 
Sbjct: 241 QTEFARRLWRTLPQQDFLKGVTHDLRG 267 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 144 

A DNA sequence (GBSx0150) was identified in S.agalactiae <SEQ ID 479> which encodes the amino acid 
sequence <SEQ ID 480>. This protein is predicted to be ATPase protein DppE. Analysis of this protein 
sequence reveals the following: 

j N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 3783 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 48 1> which encodes the amino acid 
sequence <SEQ ID 482>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 33B3 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 188/205 (91%), Positives = 197/205 (95%) 

Query: 1 MTLFAKKLGFYHK1CDQWLFKEIIILEVAPGQVLGIFGQSGCGKTSLSRV1AGF 60 

MTLEAi^GFYHKKDQWLFKEI+LEVAPGQ+LGIFGQSGCGKTSLSRVLAGFL PKSGEV 
Sbjct: 1 MTLEAKKLGFYHKKDQWLFKE I DLEVAPGQ I LG I FGQSGCGKTSLSRVLAGFLQPKSGEV 60 

Query: 61 LVDGSNLPSKAFRPVQLIQQHPEKTMNPLWPMKKSLEEAYYPSRDLLDAFGIQEKWLNRR 120 

LVDGS+LP+KAFRPVQLIQQHPE+TMMPLWEKKKSLEEAYYPS+DL DAFGIQEKWL RR 
Sbjct: 61 LVDGSHLPNKAFRPVQLIQQHPEQTMNPLWPKKKSLEEAYYPSQDLRDAFGIQEKWLKRR 120 
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Query: 121 PSELSGGELQRFSIWSLHPETKYLIADEMTTKLDSITQASWKSLLEIVKDRNLGLIVI 180 

PSELSGGELQRFSITOSLHPETKYLIADEMTTMIiDSITQASVWKSLLEIVKDRNLGLI+I 
Sbjct: 121 PSELSGGELQRFSIWSLHPETKYLIADEMTTMLDSITQASVWKSLLEIVKDRNLGIiIII 180 

Query: 181 SHDFAMLEKLCNQCYMIEENRIVSF 205 

SH+F MLEKLC+ CYMIEENR F 
Sbjct: 181 £ 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 145 

A DNA sequence (GBSx0151) was identified in S.agalactiae <SEQ ID 483> which encodes the amino acid 
sequence <SEQ ID 484>. This protein is predicted to be PTS system, trehalose-specific IIBC component 
(treB). Analysis of this protein sequence reveals the following: 

Possible site: 59 

>>> Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = 
Likelihood - 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



468 - 484 
Transmembrane 279 - 295 



Transmembrane 
Transmembrane 
Transmembrane 



112 - 128 



327 - 343 



304 - 320 



275 - 306! 



Final Results 

bacterial membrane Certainty=0. 5055 (Affirmative 

bacterial outside — Certainty=0 . 0000 (Not Clear) . 
bacterial cytoplasm — Certainty=0. 0000 (Not Clear) 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF94072 GB:AE004175 PTS system, trehalose-specific IIBC 
component [Vibrio cholerae] 
Identities = 225/484 (46%) , Positives = 318/484 (65%) , Gaps = 23/484 (5%) 

Query: 5 KHDAKALLFAIGGKSNISAOTHCATRMRFVLNDSSKAKVKVIEELPSVKGTFTNAGQFQV 64 

K D L+E +GG+ NI++VTHC TR+RFVLN +A +E L VKG FTNAGQFQV 
Sbjct: 10 KQDVTRLIELVGGESNIASVTHCLTRLRFVLNQPEQADKAGLEALSMVKGCFTNAGQFQV 69 

Query: 65 IIGNDVPIFyNAFVAVSGIEGVSKEAAKSAAQKNQNPLQRVLTMLAEIFTPIIPAIIVGG 124 

+IG +V Y + +G + VSK+ AK AA++N N L+R ++ LAEIF P++PAII GG 
Sbjct: 70 VIGTEVDQVYKMLLEQTGKQAVSKDDAKVAARQNMNVLERGISHLAEIFVPLLPAIITGG 129 

' Query: 125 LILGFRNILDAVPFEFLGQKVVDGVRQVDSSGHPIWNTLVDVSTFWSGVDSFLWLPGEAI 184 
LILGFRN++ + ++ DG TL ++S FW+ V +FLWL GEAI 

Sbjct: 130 LILGFRNVIGDI RMFDG KTLTEISQFWASVHAFLWLIGEAI 170 

Query: 185 FHFLPVGIWSVTRKMGTTQILGIVLGICLVSPQLLNAYSVASTSAADIAKNWSWNFGYF 244 

F FLPVG+ WS +K+G T ILGI LG+ LVSPQL+NAY + W+FG F 

Sbjct: 171 FFFLPVGVCWSTVKKLGGTPILGITLGVTLVSPQLMNAYLIGKEVPE VWDFGLF 224 

Query: 245 TVQKIGYQAQVI PALLAGLSLSYLE I FWRKHI PEWSM I FVPFLSLVPAI I LAHTVLGPI 304 

++K+GYQAQVIPA+LAG++L+++E R+ +P + ++ VPF+S++ +++LAH +GP 
Sbjct: 225 AIEKVGYQAQVIPAILAGVALAFIENNLRRVVPSYLYLWVPFVSIIVSVVLAHAFIGPF 284 

Query: 305 GWTLGKWISAIVLIGLTGPVKWLFGAIFG74LYAPFVITGLHHMTNAIDTQLIADTKTHTT 364 

G +G ++ +TG + +FG +YAP VITG+HH TNA+D QL+- + T 

Sbjct: 285 GRVIGDGVAFAAKAAMTGDFAVIGSTLFGFMYAPLVITGIEJHTTNAVDLQLMQE--LGGT 342 



60 Query: 365 GLWPMIALSNIAQGSAVIAyYFMHRHDEKEAQISLPAAISAYLGVTEPALFGVNVKYIYP 424 

+WP+ IALSNIAQ SAV+ + + + E IS+PAAISAYLGVTEPA++G+N+KY +P 
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-220- 

Sbjct: 343 PIWPLIALSNIAQASAWGIIIISK-KQGERDISVPAAISAYLGVTEPAMYGINLKYKFP 401 

Query: 425 FVAGMIGSSVAGLIATTPNVQANSIGVGGLPGFLSIHVKyMGYFFICMAVAIFIPLFLTL 484 
++ MIGS++A + + V AN IGVGGLPG LSI ++ + + M +AI +P LTL 
5 Sbjot: 402 MLSAMIGSALAaAVCGSAGVMftKGIGVGGLPGILSIQPQFWSIYLVAMLIAILVPAALTL 461 

Query: 485 FFKK 488 
K 

Sbjct: 462 LMYK 465 

10 

A related DNA sequence was identified in S.pyogenes <SEQ ID 485> which encodes the amino acid 
sequence <SEQ ID 486>. Analysis of this protein sequence reveals the following: 



Possible site: 59 



15 



20 



Seems to 


have no N-terminal signal sequence 










INTEGRAL 


Likelihood = 


-9.61 


Transmembrane 


466 


482 


457 


488 


INTEGRAL 


Likelihood = 


-8.01 


Transmembrane 


279 








INTEGRAL 


Likelihood = 


-6.05 


Transmembrane 


112 


128 


105 


130 


INTEGRAL 


Likelihood = 


-3.35 


Transmembrane 


204 




203 


222 


INTEGRAL 


Likelihood = 


-3.13 


Transmembrane 


255 


271 


255 




INTEGRAL 


Likelihood = 


-2.07 


Transmembrane 


327 


343 


325 


344 


INTEGRAL 


Likelihood = 


-0.59 


Transmembrane 


422 


438 


422 


435 



Final Results 

bacterial membrane Certainty=0 . 4843 (Affirmative) < suco 

25 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF94072 GB:AE004175 PTS system, trehalose-specif ic IIBC 
30 component [Vibrio cholerae] 

Identities = 231/484 (47%), Positives = 322/484 (65%), Gaps = 28/484 (5%) 



35 


Sbjct: 


5 

10 


EQDAI<SLLTAIGGKENIKVVTHCATRI4RFVLNDNNKANVK3IEKISVVKGTFTNAGQFQV 
+QD L+ +GG+ NI VTHC TR+RFVLN +A+ +E +S+VKG FTNAGQFQV 
KQDVTRLIELVGGESNIASVTHCLTRLRFVLNQPEQADKAGLEALSMVKGCFTNAGQFQV 


64 
69 


40 


Query: 
Sbjct: 


65 
70 


I IGNDVPVFYNDFTAVSSIEGVSKEAAKSAAKSNQNALQRVMTMLAEI FTPI IPAI IVGG 
+ IG +V Y 4- + VSK+ AK AA+ N N L+R ++ LAEIF P++PAII GG 
VIGTEVDQVYKMLLEQTGKQAVSKDDAKVAARQNMNVLERGI SHLAE I FVPLLPAI ITGG 


124 


Query: 
Sbjct: 


125 
13 0 


LILGFRNILESVPFEFLGQ^VEKGKLVFDAAGDPVWNTIVRVSPFWSGVNHFLWLPGEAI 
LILGFRN++ + +FD T+ +S FW+ V+ FLWL GEAI 
LILGFRNVIGDI RMFDG KTLTEI SQFWAS VHAFLWLIGEAI 


184 
170 


45 


Query: 
Sbjct: 


185 
171 


FHFLPVGITWSVTRmGTTQILGIVLGICLVSPQLLNAYAVAGTPAAEIAKNWVWDFGFF 
F FLPVG+ WS +K+G T ILGI LG+ LVSPQL+NAY + G E VWDFG F 
FFFLPVGVOTSTVKKLGGTPILGITLGVTLVSPQLMNAYLI-GKEVPE VWDFGLF 


224 


50 


Query: 
Sbjct: 


245 
225 


T INRIGYQAQVI PALLAGLSLAYLE I FWRKR I PEWSMI FVPFLSLI PALI LAHTVLGPI 

I ++GYQAQVIPA+LAG++LA++E R+ +P + ++ VPF+S+I +++LAH +GP 
AIEKVGYQAQVIPAILAGVALAFIENNLRRWPSYLYLVVVPFVSIIVSVVLAHAFIGPF 


3 04 
284 


55 


Sbjct: 


305 
285 


GWTIGKGISFWLAGLTGEVKWLFGAIFGALYAPLVITGLHHMTNAIDTQLIADTATRTT 
G IG G++F A +TG + +PG +YAPLVITG+HH TNA+D QL+ + T 
GRVIGDGVAFAAKAAMTGDFAVIGSTLFGFMYAPLVITGIHHTTNAVDLQLMQELG--GT 


364 
342 


60 


Query: 
Sbjct: 


365 
343 


GLWPMIALSNIAQGSAVFAYYLMNRHEFi?EAEISLPAAI3AYLGVTEPALFGVNVKYVYP 
+WP+IALSNIAQ SAV ++++ ++ E +IS+PAAISAYLGVTEPA++G+N+KY +P 
PIWPLIALSNIAQASAVVGIIIISK-KQGERDISVPAAISAYLGVTEPAMYGINLKYKFP 


424 
401 


Query: 
Sbjct: 


425 
402 


FVAGMIGSGIAGLLSTTFNVQANSIGVGGLPGFMAINVKYMIPFFICMAVAIWPMFLTF 

++ MIGS +A + + V AN IGVGGLPG ++I ++ + + M +AI+VP LT 
MLSAMIGSALAAAVCGSAGVMANGIGVGGLPGILSIQPQFWSIYLVAMLIAILVPAALTL 


484 
461 


65 




485 


FFRK 488 
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Sbjct: 462 LMYK 465 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 501/675 (74%), Positives = 573/675 (84%), Gaps = 2/675 (0%) 

Query: 1 MEQFKHDAKMLEAIGGKEWISAVTHCATRMRFVIibroSSKAKVKVIEELPSVKGTFTNAG 60 

M +F+ DAK+LL AIGGKENI VTHCATRMRFVLND++KA VK IE++ VKGTFTNAG 
Sbjct: 1 MGKFEQDAKSLLTAIGGKENIKVVTHCATRMRFVIJromKANVKEIEKISVVKGTFTNAG 60 

Query: 61 QFQVI IGNDVPI FYNAFVAVSGIEGVS KEAAKBAAQKNQNPLQRVLTMLAEI FTPI IPAI 120 

QFQVIIGNDVP+FYN F AVS IEGVSKEAAKSAA+ NQN LQRV+TMLAE I FTPI I PAI 
Sbjct: 61 QFQVIIGTOVPVFYM3FTAVSSIEGVSKEAAKSAAKSNQNALQRVMTMLAEIFTPIIPAI 120 

Query: 121 IVGGLILGFP^ILDAVPFEFLGQKVVDGVRQVDSSGHPIWNTLVDVSTFWSGVDSFIjWLP 180 

IVGGLILGFRNIL++VPFEFLGQ+V G D++G P+WNT+V VS FWSGV+ FLWBP 
Sbjct: 121 IVGGLILGFRNILESVPFEFLGQQVEKGKLVFDAAGDPVWNTIVRVSPFWSGVJIHFLWLP 180 

Query: 181 GEAIFHFLPVGIWSVTRKMGTTQILGIVLGICLVSPQLLNAYSVASTSAADIAKNWSWN 240 

GEAIFHFLPVGI WSVTRKMGTTQILGIVLGICLVSPQLLMAY+VA T AA+IAKNW W+ 
Sbjct: 181 GFAIFHFLPVGITWSVTRKMGTTQILGIVLGICLVSPQLLNAYAVAGTPAAEIAKMWWD 240 

Query: 241 FGYFTVQKIGYQAQVIPALLAGLSLSYLEIFWRKHIPEWSMIFVPFLSLVPAIILfiHTV 300 

FG+FT+ + IGYQAQVI PALLAGLSL+YLEI FWRK IPEWSMIFVPFLSL+PA+ILAHTV 
Sbjct: 241 FGFFTINR1GYQAQVIPALLAGLSIAYLEIFWRKRIPEWSMIFVPFLSLIPALILAHTV 300 

Query: 301 LGPIGWTLGKWISAIVLIGLTGPVKWLFGAIFGALYAPFVITGLHHMTNAIDTQLIADTK 360 

LGPIGWT+GK IS +VL GLTGPVKWLFGAIFGALYAP VITGLHHMTNAIDTQLIADT 
Sbjct: 301 LGPIGWTIGKGISEVVLAGLTGPVlOTLFGAIFGALYAPLVITGIjHHMTWAIDTQLIADTA 360 

Query: 361 THTTGLWPMIALSNIAQGSAylAYYFraRHDEKEAQISLPAAISAYLGVTEPALFGVNVK 420 

T TTGLWPMIALSNIAQGSAV AYY M+RH+E+EA+ISLPAAISAYLGVTEPALFGVNVK 
Sbjct: 361 TRTTGLWPMIALSNIAQGSAVFAYYLMNRHEEREAEISLPAAISAYLGVTEPALFGVNVK 420 

Query: 421 YIYPFVAGMIGSSVAGLLATTFNVQANSIGVGGLPGFLSINVKYMGYFFICMAVAIFIPL 480 

Y+YPFVAGMIGS +AGLL+TTFNVQANSIGVGGLPGF++INVJCYM FFICMAVAI +P+ 
Sbjct: 421 YWPFVAGMIGSGIAGLLSTTFNVQANSIGVGGLPGFMAINVKYMIPFFICMAVAIWPM 480 

Query: 481 FLTIjFFKKSGILTKTEEEKLVPDAVIASTTETKSAKEKAWSGTKLSWSPLSGLAKPLD 540 

FLT FF+KS I+TKTE+E +P+ + S +A K + GT +++ SPL+G K L 

Sbjct: 481 FLTFFFRKSHIMTKTEDEAKLPETPV-SDAPVATAPHK-TMQGTVITLTSPLTGEVKALS 538 

Query: 541 QASDPVFSQGIMGKGWIDPSDGELVSPVDATVSVLFPTKHAIGLLTSEGVEFIiIHIGMD 600 
+A DPVF+QG+MG+G ++ P++G hV+P DA VSVLFPTKHAI L+T+EG+E L-f-HIGMD 
1 Sbjct: 539 EATOPVFAQGVMGQGALLQPTEGVLVAPCDAEVSVLFPTKHAICLVTTEGLELLMHIGMD 598 

Query: 601 TVNLEGKGFTSHVAQGDTVKVGDKIiITFDIPMIKEEGYIVETPILITWQQEFRPEELIDL 660 

TVNL+G+GF + V QGD VK G LI FDI IE GY ETP+++TNQ F L 
Sbjct: 599 TVNLDGQGFEALVKQGDQVKAGQTLIQFDIAAISEAGYATETPLVVTNQDVFTVTVEGSL 658 

Query: 661 PKQIKRGQALMVAKK 675 

P+QIK L VA K 
Sbjct: 659 PRQIKVNDKLAVAVK 673 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 146 

A DNA sequence (GBSx0152) was identified in S.agalactiae <SEQ ID 487> which encodes the amino acid 
sequence <SEQ ID 488>. This protein is predicted to be dextran glucosidase DexS (treC). Analysis of this 
protein sequence reveals the following: 

Possible site: 48 
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>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 34 93 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

:>GP:AAB65079 GB:D35633 dextran glucosidase DexS [Streptococcus suis] 
Identities = 383/547 (70%) , Positives = 439/547 (80%) , Gaps = 13/547 (2%) 

Query: 1 MTIDKRKVWQIYPKSYKDTTGNGVGDLRGIIEKLPYJ^LGIDMVmJ'JPFYPSPQRDNG 6 0 

MTIDKRKVVYQIYPKSYKDTTGNGVGDLRGI IEKLPYL ELGIDM+WLNPFYPSPQRDNG 
Sbjct: 1 MTIDKRKWYQIYPKSYKDTTGNGVGDLRGI IEKLPYLKELGIDMIWLNPFYPSPQRDNG 60 

Query: 61 YDI SDYTAINPDFGTMDDFEEMI EVGRQYRI DFMLDMVLNHCS I EHEWFKKALAGDRYYQ 120 
YDISDYTA+NPDFGTM DFEEM+ VG++ I +FMLDMVLNHCS +HEWF+KAL+GD+YYQ 

Query: 121 DFFILRDNPTDWSKFGGNAWAPFGDTGKYYLHLFDITQADLNWRNADVRKELFKVVNFW 180 

DFFILRD PTDOTSKFGGNAWAPFGDTGKYYLHLFD+TQADLNWRN +R+ELFKWNFW 
Sbjct: 121 DFFILRDQPTDWSKFGGNAWAPFGDTGKYYLHLFDVTQADIJMRNPHIREELFKVVNFW 180 

Query: 181 RDKGVKGFRFDVINLIGKDEILENCPINDGKPAYTDRPITHDYLKMLNNASFGQDDSFMT 24 0 

+DKGVKGFRFDVINLIGKDE E+CPINDGKPAYTDRPITHDYLKM+NNA+FG + FMT 
Sbjct: 181 KDKGWGFRFDVINLIGKDFAREDCPINDGKPAYTDRPITHDYLraMNNATFGSEKGFMT 240 

Query: 241 VGEMSSTTIANCILYTAPEREELSMAFNFHHLKVDYKDGQKWTIMAFDFPALRDLFHSWG 300 

VGEMS+TTI NCILYTAPER+ELSMAFNFHHLKVDYKDGQKWTIM FDF L+ LFH+WG 
Sbjot: 241 VGEMSATTIENCILYTAPERKELSI'IAFNFHHLKvDYKDGQKWTIMDFDFEELKHLFHTWG 300 

Query: 301 EGMSEGNGTOALFYNNHDQPRAITOFVDvKRFRNEGATMLAASIHLSRGTPYIYMGEEIG 360 

E MS GNGWNALFYNNHDQPRALNRF+DV+ FR EGATMLAAS IHLSRG 
Sbjct: 301 EEMSVGNGWNTALFYN^DQPPJUjNRFIDVENFRKEGATMLAASIHLSRGNNLTST 355 

Query: 361 MLDPDYSSMDDYVDIESLNAYQIMLDEGKSQEEAFSIIRAKSRDNSRVPMQWDDS 415 

+ SS + + + + + S + +RSR+ P+ 
Sbjct: 356 WVRRSVSSTLTTIAWTTTWTWSLSMPTRCSWTKVTRLSR-PSRLSRPSPVTIPAPRCNGT 414 

Query: 416 - - TNAGFSEGAPWLKVGKSYKEINVAKEKTGLI FTFYQELIRLRKQLPI IADGNYKAAFK 473 

T + PWLK GKSY+ INV +EKTG IFTFY+ LRK+LP+ 1 ++G+ YKAA+ K 

Sbjct: 415 LLTMQASQQATPWLKAGKSYQTINVEQEKTGPIFTFYKRTHPLRKELPLISEGDYKAAYK 474 

Query: 474 DNEKVYAFERHLDKEKLLVLNNFFAEKVKIKLPENYLQGQV1LSNYKDWLDETVTLQPY 533 

D++KVYAFER L+ EKLLVLNNFFAE+V++ L ++Y GQVL+SNY D L + + L+PY 
Sbjct: 475 DSQKvYAFERLLNDEKLLVLNNFFAEEVELDLADDYAHGQVLISNYPDNKLGK^ 534 

Query: 534 QTLAILV 540 

Q LAI V 
Sbjct: 535 QALAIQV 541 

A related DNA sequence was identified in S.pyogenes <SEQ ID 489> which encodes the amino acid 
sequence <SEQ ID 490>. Analysis of this protein sequence reveals the following: 

^ N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3631 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 431/539 (79%) , Positives = 486/539 (89%) 
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Query: 1 MTIDKRKVVYQIYPKSYKDTTGKGVGDLRGIIEKLPYLAELGIDMVWLNPPYPSPQRDNG 60 

MTIDK+KWYQIYPKSYKDTTGNGVGDL GII+KLPYL ELGIDM+WLNPFYPSPQRDNG 
Sbjct: 1 MTIDKKKWYQIYPKSYKDTTGNGVGDLLGIIDKLPYLQELGIDMIWLNPPYPSPQRDWG 60 

Query: 61 YDISDYTAIMPDFGTMDDFEEMIEVGRQYRIDFMLDMTOIHCSIEHEWFKKAliRGDRYYQ 120 

YD+SDYTA+NPDFGTM DFE +++ ++++I+ MLDMVLNHCS +HEWF4KAIAGD YYQ 
Sbjct: 61 YDVSDYTATOPDFGTMMFENLVKAMEHQIEimDMVlOTCSTDHEWFQKALAGDPYYQ 12 0 

Query: 121 DFFILF^NPTDWSKFGGNAWAPFGDTGKYYLHLFDITQADLNWRNADWKELFKWNFW 180 

DFFILRD PTDWVSKPGGNAWAPFGDTGKYYLHLFD+TQAD1MWRN W+EL KWNFW 
Sbjct: 121 DFFILRDQPTDWSKFGGNAWAPFGDTGKYYI^LFDVTQADKNWRNPHVREELAKVVNFW 180 

Query: 181 RDKGVKGFRFDVINLIGKDEILENCPINDGKPAYTDRPITHDYtjKMIiNNASFGQDDSFMT 240 

RDKGVKGFRFDVINLIGKDE L +CP+NDGKPAYTDRPITH YL LN ASFGQDDSFMT 
Sbjct: 181 RDKGVKGFRFDVINLIGKDEELVDCPVNDGKPAYTDRPITHTYLHDLNQASFGQDDSFMT 240 

Query: 241 VGEMSSTTIANCILYTAPEREELSMAFNFHHLKVDYKDGQKWTIMAFDFPALRDLFHSWG 300 

VGEMS+TTI NC+LYTAPEREELSMAFNFHHLKVDY++GQKWTIMAFDF ALRDLFH+WG 
Sbjct: 241 VGEMSATTIDNCLLYTAPEREELSMAFWFHHLKVDYEMGQKWTIMAFDFAALRDLFHAWG 3 00 

Query: 301 EGMSEGNGWNALFYNiraDQPRALNRFVDVKRFRNEGATMIAASIHLSRGTPYIYMGEEIG 360 

EGMS+GNGWNALFYNNHDQPRALKRFVDV FRNEGATMLAASIHLSRGTPYIYMGEEIG 
Sbjct: 301 EGMSQGNGVMALFYOTHDQPRA1MIFVDVTHFRNEGATMIAASIHLSRGTPYIYMGEEIG 360 

Query: 361 MLDPDYSSMDDYVDIESMAYQIMLDEGKSQEEAFSIIRAKSRDNSRVPMQWDDSTNAGF 420 

MLDPD+ SMDDYVD+ESLNAY +L GKS EEAF+II+AKSRDN+R PMQWD S +AGF 
Sbjct: 361 MLDPDFDSMDDYVDVESLNAYSSLLVSGKSAEEAFAIIKAKSRDNARTPMQWDASEHAGF 420 

Query: 421 SEGAPWLKVGKSYKEINVAKEKTGLIFTFYQELIRLRKQLPIIADGKTYKAAFKDKIEKVYA 480 

+ G PWL+VGKSY++INV EK G IF FYQ LI LRK+LPIIA+G+Y+AAFKD++ VYA 
Sbjct: 421 TTGKPWLEVGIffiYRDINVETEKEGRIFPFYQRLIALRKELPIIAEGDYRAAFKDSQAVYA 480 

Query: 481 FERHLDKEKLLVI^FFAEKVKIKLPENYLQ^VLLSNYIODWLDETVTTjQPYQTIAIL 539 

FERHL + LLVLN+F+A++V+++LP Y GQVL+SNY+ V++ E V L+PYQTLAIL 
Sbjct: 481 FERHLGDQCLLVLNHFYADEVELELPPRYQHGQVLISNYEKVSICEKVILKPYQTLAIL 539 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 147 

A DNA sequence (GBSx0153) was identified in S.agalactiae <SEQ ID 491> which encodes the amino acid 
sequence <SEQ ID 492>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -3.03 Transmembrane 8 - 24 ( 8-25) 



Final Results 

bacterial membrane Certainty=0. 2211 (Affirmative) < suco 

bacterial outside Certainty=o . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 148 

A DNA sequence (GBSx0154) was identified in S.agalactiae <SEQ ID 493> which encodes the amino acid 
sequence <SEQ ED 494>. Analysis of this protein sequence reveals the following: 

i cleavable N-term signal seq. 

- Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 



Query: 




Sbjct: 




Query. 




Sb D ct: 


67 


Query: 


123 


Sbjct: 


127 


Query: 




Sbjct: 


187 




232' 


Sbjct: 


243 


Query: 


202 


Sbjct: 


302 




342 


Sbjct: 


362 




401 


Sbjct: 


422 






Sbjct: 


482 




520 


Sbjct: 


542 




578 


Sbjct: 


602 



TWIMLVFLARKNLSLYELTVQTKFSIKVIIEQINYLNSFIAKNHLPAIAHSAGRYQLLC3 65 
T ++ + AR L + ELT + S + 4- + +NS+L + L A+ + L+ 
TFILTQLLHARSYLPIQELTQKLNVSRRTVYNDLEKINSWLEEQGLKAV-YKVRSQGLIL 66 



H R GY++ GDE DK + 



ES ++ IQ D L 



--DRLEKSLYFIILILCRYQRSVD 231 



Y F KE L + L HI PA++R+KYGL 



++IK Y +LFLL +K + LE VG 4- D+E4-++ +HF G++R+ G 



LI+C NGV +S 4-K 4 



I K +P L+QL+TE+ Q + +W+EAI +LAAKPLL G +TESY + MI+ 



AIPHA+PEDGV +GMS+L L++P 



A related DNA sequence was identified in S.pyogenes <SEQ ID 495> which encodes the amino acid 
sequence <SEQ ID 496>. Analysis of this protein sequence reveals the following: 



signal sequence 
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INTEGRAL Likelihood = -0.64 Transmembrane 123 - 139 ( 123 ■ 

Final Results 

bacterial membrane Certainty=0.12SS (Affirmative) 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 187/524 (29%) , Positives = 327/624 (51%) , Gaps = 20/6; 



YQLL- -GDEKEHDKIVSLLEAEQFYLTQEERVCLIYLYSFCRREFVSNVHYQDFLKVSKN 118 

QL+ D E D + +L YL Q+ER+ +1 +Y +EF+S H + L++S+N 

-QLVYPPDLTEKDLLNRMLPTLDDYLFQDERLDMIIIYIMMAKEFISINHLESLLRLSRN 151 



Query: 


1 


Sb j Ct : 


35 


Query: 


61 


Sbjct: 


93 


Query: 


119 


Sbjct: 


152 


Query: 


179 


Sb j ct : 


212 


Query: 


237 


Sb j ct : 


272 


Query: 


295 


Sbjct: 


324 




355 


Sbjct: 


383 






Sb j ct : 


443 




474 


Sbj ct: 


503 


Query: 


533 


Sbjct: 


563 


Query: 


593 


Sbjct: 


623 



++L Y R GY G+ -t 



L+ I ++L +YF L+ 



EIi Q Ii H++PAY+RL Y + 



LVK++L PLE+Q+G + + E++YF IHFG +L+ S AL +CPNG+SSSL+ 



+W+EAI RLAA+ PLL I SY + MI+ V E G +1 h 



A+PHA PE G +GMS+L L++P 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 149 

A DNA sequence (GBSx0155) was identified in S.agcdactiae <SEQ ID 497> which encodes the amino acid 
sequence <SEQ ID 498>. Analysis of this protein sequence reveals the following: 

60 Possible site: 22 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 3665 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 499> which encodes the amino acid 
sequence <SEQ ID 500>. Analysis of this protein sequence reveals the following: 

Possible site: 22 
10 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3665 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 33/35 (94%), Positives = 35/35 (99%) 

20 Query: 1 MEKEAKQ1 IDLKRNLFKIDVRAQKDEEKVFMRTAW 35 

+EKEAKQ+ IDLKRNLFKIDVRAQKDEEKVFMRTAW 
Sbjct: 1 LEKEAKQMIDLKRNLFKIDVRAQKDEEKVFMRTAM 35 

. Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
25 vaccines or diagnostics. 

Example 150 

A repeated DNA sequence (GBSx0156) was identified in S.agalactiae <SEQ ID 501> which encodes the 
amino acid sequence <SEQ ID 502>. This protein is predicted to be a repeat-associated protein in rhsc-phrb 
intergenic region. Analysis of this protein sequence reveals the following: 

30 Possible site: 44 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.57 Transmembrane 29 - 45 ( 28 - 48) 

Final Results 

35 bacterial membrane Certainty=0. 2826 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A closely-related DNA sequence was identified in S.agalactiae <SEQ ID 1035> which encodes the amino 
40 acid sequence <SEQ ID 1036>. Further related GBS sequences are: <SEQ ID 9067>, <SEQ ID 9068>, 
<SEQ ID 9497>, <SEQ ID 9498>, <SEQ ID 9733>, <SEQ ID 9734> 

A related repeated DNA sequence was identified in S.pyogenes <SEQ ID 503> which encodes the amino 
acid sequence <SEQ ID 504>. Analysis of this protein sequence reveals the following: 

Possible site: 44 
45 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.57 Transmembrane 29 - 45 ( 28 - 48) 

Final Results 

bacterial membrane Certainty=0. 2826 (Affirmative) < suco 

50 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS gene <SEQ ID 8547> and protein <SEQ ID 8548> were also identified. Analysis of this 
protein sequence reveals the following: 



Lipop Possible site: -1 Crend: 5 
McG: Discrim Score: -7.73 
GvH: Signal Score (-7.5): -3.88 

Possible site: 44 
>>> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -4.57 threshold: 0.0 

INTEGRAL Likelihood = -4.57 Transmembrane 26 - 42 ( 25 - 
PERIPHERAL Likelihood = 2.12 334 
modified ALOM score: 1.41 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 2826 (Affirmative) • 

bacterial outside Certainty=0 . 0000 (Not Clear) < : 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < i 



A related DNA sequence was identified in S. pyogenes <SEQ ID 707 1> which encodes the amino acid 
sequence <SEQ ID 7072>. An alignment of the GAS and GBS sequences follows: 



Query: 4 MIDFIISIDDCAVELDSRQSWKIRSPLSTILFLVFVCQLAGIETWKEMEDFIEMNEPLFA 63 

MIDFIISIDDCAVELDSRQSWKIR PLSTILFLVWCQIAGIETWKEMEDFIEMNEPLFA 
Sbjct: 1 MIDFIISIDDCAVELDSRQSWKIRYPLSTILFLVFVCQLAGIETWKEMEDFIEMNEPLFA 60 

Query: 64 TYVDLSEGCSSHDTLERVISLVNSDRLKELKVQFEQSLTSLDAVHQLISVDGKTIRGNRG 123 

TYVDLSEGC SHDTLERVISLVNSDRLKELKVQFEQSLTSLDAVHQLISVDGKTIRGNRG 
Sbjct: 61 TYVDLSEGCPSHDTLERVISLVNSDRLKELKVQFEQSLTSLDATOQLISVDGKTIRGNRG 120 

Query: 124 KNQKPVHIVTAYDGGHHLSLGQVAVEEKSNEIVAIPQLLRTIDIRKSIVTIDAMGTQTAI 183 

KNQKPVHIVTAYDGGHHLSLGQvAVEEKSNE.IVAIPQLLRTIDIRKSIVTIDAMGTQTAI 
Sbjct: 121 KNQKPVHIVTAYDGGHHLSLGQVAVEEKSNEIVAIPQLLRTIDIRKSIVTIDAMGTQTAI 180 

Query: 184 VDTI IKGKADYCLAVKGNQETLYDD lALYFSDVKLLEELQENAQYYQTVEKSRGQIEVRE 243 

VDTIIKGKADYCI^VKGNQETLYDDIALYFSDVNLLEELQENAQYYQTVEKSRGQIEvRE 
Sbjct: 181 VDTIIKGKADYCLAVKGNQETLYDDIALYFSDVNLLEELQENAQYYQTVEKSRGQIEVRE 240 

Query: 244 YWVSSDIKWLCQNHPKWHKLRGIGMTRNTIDKDGQLSQENRYFIFSFKPDVLTFANCVRG 303 

YWVSSDIKWLCQNHPKWHKLRGIGMTRNTIDKDGQLSQENRYFIFSFKPDVLTFANCVRG 
Sbjct: 241 YWSSDIKWLCQraPKWHKLRGIGMTRKTIDKDGQLSQENRYFIFSFKPDVLTFANCVRG 300 

Query: 3 04 HWQIESMHWLLDWYHEDHHQTLDKRAAFNLNLIRKMCLYFLKVMVFPKKDLSYRRKQRY 363 

HWQIESMHWLLDVVYHEDHHQTLDKRARFNLNLIRKMCLYFLJCVMVFPKKDLSYRRKQRY 
Sbjct: 301 HWQIESMHWLLDVVYHEDHHQTLDKRAAFT^n^NLIRPCMCLYFLICVMVFPKKDLSYRRKQRY 360 

Query: 364 I S VHLEDYLVQLFGERG 380 

I SVHLEDYLVQLFGERG 
Sbjct: 361 I SVHLEDYLVQLFGERG 377 

A further related DNA sequence was identified in S.pyogenes <SEQ ID 9087> which encodes the amino 
acid sequence <SEQ ID 9088>. A further related DNA sequence was identified in S.pyogenes <SEQ ID 
9089> which encodes the amino acid sequence <SEQ ID 9090>. The GAS and GBS proteins are 100% 
identical. 

There is also homology to SEQ IDs 7018 and 8548. 

SEQ ID 8548 (GBS318) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 46 (lane 5; MW 70kDa). 
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GBS318-GST was purified as shown in Figure 203, lane 3. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 151 

5 A DNA sequence (GBSx0157) was identified in S.agalactiae <SEQ ID 505> which encodes the amino acid 
sequence <SEQ ID 506>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have an uncleavable N-term signal seq 

10 Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has no significant homology with any sequences in the GENPEPT database, but there is 
homology to SEQ ID 496. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 152 

20 A repeated DNA sequence (GBSx0158) was identified in S.agalactiae <SEQ ID 507> which encodes the 
amino acid sequence <SEQ ID 508>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

>>> Seems to have no N- terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0 . 1054 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB03941 GB:AP001507 unknown conserved protein [Bacillus halodurans] 
Identities = 26/82 (31%) , Positives = 52/82 (62%) , Gaps = 2/82 (2%) 

Query: 2 LRIGTACGSGLGSSFMVQMNIESHiKDLGVSDVEVEHYDLGGADPSAADVWIVGRDLEDS 61 
35 ++I CG G G+S +++MN+E++L LG++ +V++ D+ A +D I ++L +S 

Sbjct: 1 MKILCTCGLGC<3TSLILKMIWETvlSQI^IA-ADVDNTDVSSASSEQSDFIITSKELAES 59 

Query: 62 -AGHLGDVRILNS I IDMDELRE 82 
AH + I+N+ DM+E+++ 
40 Sbjct: 60 LASHPSKIVIVMNYFDMEEIKQ 81 

A related DNA sequence was identified in S.pyogenes <SEQ ID 509> which encodes the amino acid 
sequence <SEQ ID 510>. Analysis of this protein sequence reveals the following: 

Possible site: 49 
45 »> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 
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Identities = 27/90 (30%) , Positives = 51/90 (56%) , Gaps = 1/90 (1%) 

Query: 1 MLRIGTACGSGLGSSFMVQMNIESILKDLGVSDVEVEHYDLGGADPSAADVWIVGRDLED 60 
M++I T CG+G+GSS +++M +E+I LG+ DV+ EDA AD+++ ++ +D 
5 Sbjct: 8 MI KI VTVCGNGIGS SLLLRMKVEAIAS SLGI - D VBAES CDSNAAVGKGADLFVTVKEFKD 66 

Query: 61 SAGHLGDVRILNSI IDMDELREL VTGI CQE 90 

V 1+ S + ++ E + + +E 
Sbjct: 67 IFPEDAKVCIWSYTNRKKIEEDLVPVLKE 96 

10 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 153 

A DNA sequence (GBSx0159) was identified in S.agalactiae <SEQ ID 51 1> which encodes the amino acid 
15 sequence <SEQ ID 512>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>» Seems to have an uncleavable N-term signal seq 

Final Results 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 154 

A DNA sequence (GBSx0160) was identified in S.agalactiae <SEQ ID 513> which encodes the amino acid 
30 sequence <SEQ ID 514>. This protein is predicted to be sgaT. Analysis of this protein sequence reveals the 
following: 



Possible site: 16 



35 



. Seems to 


have a cleavable N-term signal seg. 










INTEGRAL 


Likelihood =-14.97 


Transmembrane 


424 - 


- 440 


! 411 




INTEGRAL 


Likelihood = -8.86 


Transmembrane 


224 ■ 


- 240 


: 221 


- 248) 


INTEGRAL 


Likelihood = -7.27 


Transmembrane 


134 - 


- 150 


( 124 


- 167) 


INTEGRAL 


Likelihood = -7.11 


Transmembrane 


321 ■ 


- 337 


! 314 


- 349) 


INTEGRAL 


Likelihood = -6.64 


Transmembrane 


379 • 


- 395 


! 370 


- 397) 


INTEGRAL 


Likelihood = -6.21 


Transmembrane 


96 • 


- 112 


! 94 


- 115) 


INTEGRAL 


Likelihood = -6.05 


Transmembrane 


267 • 


- 283 


! 257 


- 289) 


INTEGRAL 


Likelihood = -3.13 


Transmembrane 


18 • 


- 34 


( 17 


- 35) 


INTEGRAL 


Likelihood = -2.55 


Transmembrane 




- 167 


! 151 


- 167) 


INTEGRAL 


Likelihood = -0.32 


Transmembrane 




- 58 


( 42 


- 58) 



bacterial membrane Certainty=0 . 698S (Affirmative) < succ: 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



50 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB52363 GB:AL109747 putative integral membrane protein 
[Streptomyces coelicolor A3 (2) ] 
Identities = 202/453 (44%) , Positives = 292/453 (63%) , Gaps = 22/453 (4%) 
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•VGFLWSGGTGIVQNSLNPF 65 



Query: 66 GIMFEHAFHLVGWPMSrEaiVAVM.TKYGSATiU^IMLAGMIFNILIARFTKFKYIFLTGH 125 

G+M + GV+P NEAIV +A +++G+ A +M+ G + ++ +ARFT +Y+FLTGH 

Sbjct: 70 GRMIQGTTGTHGVIPTNEAIVGIAQSEFGARVAMLMILGFLVSLALARFTPLRYVFLTGH 129 

Query: 126 HTLYMACMIAVIFAVAGFTSFSLILFGGLALGIIMSVSPAFVQKYMIQLTGNDKVALGHF 185 
H L+MA ++ ++ A AG S +++L GG+ +GI++ PAF + ++TGND +A+GHF 

Query: 186 GSLGYWLSGFIGGIVGDKSKSTEDIKFPKSLSFLRDSTVSITISMAIIYLIVAV 239 

G+ GY +SG G +VG S+STE++K P+ L FLRDS V+ +SM +IYL++++ 
Sbjct: 190 GTAGYIVSGATGQLVGKNSRSTEEMKLPEGLRFLRDSMVATALSMVLIYLVMSLLFLAKV 249 

Query: 240 FAGEAYIAKEISNGVNGLVYALQLAGQFAAGVFVILAGVRLILGEIVPAFKG 291 

FAG ++ N L+ ++ QF GV VIL GVR ILGE+VPAF+G 

Sbjct: 250 GQDAAFKAFAGSG - - GDPAADVGNYLMQS VMQGLQFG I GVAVI LFGVRTI LGELVPAFQG 3 07 

Query: 292 ISEKLVPNSKPALDCPIVYPYAPNAVLIGFISSFVGGLVSMIVMI VTGTTVILPG 346 

1+ ++VP +KPALD PIV+PYA NAVLIGFI SF+GGL + +1 G ++LPG 

Sbjct: 3 08 IAGRWPGAKPALDAPIVFPYAQNAVLIGFIFSFLGGLTGLAALIWVFNPAFGLALVLPG 367 

Query: 347 WPHFFCGATAGVIGNASGGVRGATIGAFVQGILISFLPIFLMPVLGGLGFKGSTFSDAD 406 

+VPHFF G AGV GNA+GG RGA +G+F+ G+LI+FLP L+ LG G +TF DAD 
Sbjct: 368 LVPHFFTGGAAGVYGNATGGRRGAAVGSFLNGLLITFLPAILLKALGSFGEANTTFGDAD 427 

Query: 407 FGLTGI ILGALNHVGGAIAI VTGIWILIGLFG 439 

FG G +LG++ + G ++ ++ L+ L G 
Sbjct: 428 FGWFGAVLGS IGKLDGTAGLIGMLI FGLLI LAG 460 



A related DNA sequence was identified in S. pyogenes <SEQ ID 515> which encodes the amino acid 
sequence <SEQ ID 516>. Analysis of this protein sequence reveals the following: 



Possible s 
INTEGRAL 



: 34 

■ have a cleavable N-terra signal seq. 



Likelihood = 
Likelihood = 
Likelihood = 
integral Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood - 
INTEGRAL Likelihood = 



Transmembrane 269 



308 - 



Transmembrane 
Transmembrane 
Transmembrane 



346 ( 315 - 

■ 243 ( 221 - 

■ 143 ( 126 - 

• 285 ( 266 - 

• 59 ( 41 - 

• 114 ( 91 - 



Final Results 

bacterial membrane Certainty=0 .4333 (Affirmative) 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < i 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < : 



The protein has homology with the following sequences in the databases: 

>GP:CAB52363 GB:AL109747 putative integral membrane protein 
[Streptomyces coelicolor A3 (2) ] 
Identities = 162/387 (41%) , Positives = 245/387 (62%) , Gaps = 17/387 (4%) 

Query: 8 IRDILKEPAFLMGLIAFAGLVALKTPAHKVLTGTLGPILGYLMLVAGAGVIVTNLDPLAK 67 

+ +IL +PA+L+G+I GL ALK + + G + LG L++ AGAG++ ++LDPL + 
Sbjct: 12 VNEILSQPAYLIGIITAVGLAALKKSVGQTVGGAIKATLGLLLVGAGAGLVSSSLDPLGR 71 

Query: 68 LIEHGFSITGWPNNEAVTSVAQKILGVETMSILWGLLLNLAFARFTRFKYIFLTGHHS 127 

+1+ GV4P NEA+ +AQ G ++++G L++LA ARFT +Y4FLTGHH 

Sbjct: 72 MIQGTTGTHGVIPIMEAIVGIAQSEFGARVAWIJ'IILGFLVSLALARFTPLRYVFLTGHHM 131 



Query: 128 FFMACLLSAVLGAVGFKGSLLIIL-DGFLLGAWSAISPAIGQQYTLKVTDGDEIAMGHFG 186 
65 FMA LL+ V+ G +GS+ ++L G L+G PA +T KVT D +A+GHFG 
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Sbjct: 132 LF^TLLTIVMATAG-QGSVAVVLGGGVBVGIIiLVALPAFAHPWrKKVTGNDTLAIGHFG 190 

Query: 187 SLGYXLSAWVGSKVGKDSKDTEDLQISEKWSFLRNTTISTGLIMVIPYLVAT VASVL 243 

+ GY +S G VGK+S+ TE++++ E FLR++ ++T Ii MV+ YLV + +A V 
Sbjct: 191 TAGYIVSGATGQLVGKWSRSTEEMKLPEGLRFLRDSMVATALSMVLIYLVMSLLFLAKVG 250 

Query: 244 RNASVAEELAAGQNP FIFAIKSGLTFAVGVAIVYAGVRMILADLIPAFQGIAN 296 

++A+ +G +P + ++ GIi F +GVA++ GVR IL +L+PAFQGIA 

Sbjct: 251 QDARFKAFAGSGGDPAADVGNYLMQS\'MQ3LQFGIGVAVIL?GVRTILGELVPAFQGIAG 310 

Query: 297 KLIPNAIPAVDCAVFFPYAPTAVIIGFASSFVGGLLGMLIL GVAGGVLIIPGMVP 351 

+++P A PA+D + FPYA AV+IGF SF+GGL G+ L G L++PG+VP 

Sbjct: 311 RVVPGAKPALDAPIVFPYAQNAVLIGFIFSFLGGLTGIAALIWFNPAFGLALVLPGLVP 370 

Query: 352 HFFCGATAEIFGNSTGGRRGAMIGASI, 378 

HFF G A ++GN+TGGRRGA +G+ L 
Sbjct: 371 HFFTGGAAGVYGNATGGRRGAAVGSFL 397 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 174/376 (46%) , Positives = 258/376 (68%) , Gaps = 2/376 (0%) 

Query: 1 MKGLLDFLVNIASTPAILVALIAIIGLVLQKKGVPDIVKGGIKTFVGFLWSGGTGIVQN 60 

M+ LL F+ +1 PA L+ LIA GLV K ++ G + +G+L++ G G++ 

Sbjct: 1 MEALLSFIRDIDKEPAFLMGLIAFAGLVALKTPAHKVLTGTLGPILGYLMLVAGAGVIVT 60 

Query: 61 SLNPFGKMFEHAFHLVGWPNNEAIVAVALTKYGSATALIMLAGMIFNILIARFTKFKYI 120 

+L+P K+ EH F + GWPNNEA+ +VA G T I++ G++ N+ ARFT+FKYI 
Sbjct: 61 NLDPIAKLIEHGFSITGWPNNEAVTSVAQKILGVETMSILWGLLIiNLAFARFTRFKYI 120 

Query: 121 FLTGHHTLYMACMIAVIFAVAGFTSFSLILFGGIALGIIMSVSPAFVQKYMIQLTGNDKv' 180 

FLTGHH+ +MAC+++ + GF LI+ G LG ++SPA Q+Y +++T D++ 
Sbjct: 121 FLTGHHSFFMACLLSAVLGAVGFKGSLLIILDGFLLGAWSAISPAIGQQYTLKVTDGDEI 180 

^ A+GHFGSLGY+LS ++G VG SK TED++ + SFLR++T+S + M I YL+ VA 

Sbjct: 181 AMGHFGSLGYYLSAWGSKVGKDSKDTEDLQISEKWSFLRNTTISTGLIMVIFYLVATVA 240 

Query: 239 VFAGEAYIAKEISNGVNGLVYALQLAGQFAAGVFVTLAGVRLILGEIVPAFKGISEKLVP 298 

A +A+E++ G N ++A++ FA GV ++ AGVR+IL +++PAF+GI+ KL+P 
Sbjct: 241 SVLRNASVAEEIAAGQNPFIFAIKSGLTFAVGVAIVYAGVRMILADLIPAFQGIANKLIP 300 

Query: 299 NSKPALDCPIVYPYAPNAVLIGFISSFVGGLVSMIVMIVTGTTVILPGWPHFFCGATAG 358 

N+ PA+DC + +PYAP AV+IGF SSFVGGL+ M+++ V G +I+PG+VPHFFCGATA 
Sbjct: 301 NAIPAVDCAVFFPYAPTAVIIGFASSFVGGLLGKLILGVAGGVLIIPGMVPHFFCGATAE 360 

Query: 359 VIGNASGGVRGATIGA 374 

+ GN++GG RGA IGA 
Sbjct: 361 IFGNSTGGRRGAMIGA 376 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 155 

A DNA sequence (GBSx0161) was identified in S.agalactiae <SEQ ID 517> which encodes the amino acid 
sequence <SEQ ID 518>. This protein is predicted to be transketolase, N-terminal subunit (tkt). Analysis of 
this protein sequence reveals the following: 
Possible site: 45 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3680 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



5 GB:D67515 transketolase 1 [Methanococcus jannaschii] 
Identities = 106/269 (39%) , Positives = 158/269 (58%) , Gaps = 4/269 (1%) 



Query: 


11 


Sbj ct: 


10 




71 


Sbjct: 


70 




131 


Sbjct: 


129 




191 


Sbjct: 


189 




251 


Sbj ct : 


249 



LY +M+ +P+ 



WEA AAH++L +LI F+D NK Q+DG T 



D+ + GD AKFEAFG+D + G + E I ++ + + +PK 1+ ++KG+GV 



H P+ +Q L+4AL L E 
'HGKAPNEEQ LKQALEELSE 274 

A related DNA sequence was identified in S.pyogenes <SEQ ID 519> which encodes the amino acid 
sequence <SEQ ID 520>. Analysis of this protein sequence reveals the following: 
Possible site: 26 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.75 Transmembrane 58 - 74 ( 57 - 74) 

Final Results 

bacterial membrane Certainty=0 . 1298 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9165> which encodes the amino acid sequence 
<SEQ ID 9166>. Analysis of this protein sequence reveals the following: 

Possible site: 54 
>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.75 Transmembrane 40 - 56 ( 39 - 55) 

Final Results 

bacterial membrane — Certainty=0 . 130 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 82/246 (33%) , Positives = 129/246 (52%) , Gaps = 15/246 (6%) 

Query: 18 IRIjNTLETIOTLGFGHYGGSLSIVEALAVLYGDIMDINPEKFKE-SDRDYMVIjSKGHAGP 76 

+R +++ + GH G + VL+ M+INP+ + S+RD +LS GH 

Sbjct: 82 VRTLSMDAIQAANSGHPGLPMGAAPI'IAYVLWNHFMNINPKTSRNWSNRDRFILSAGHGSA 141 

Query: 77 ALYSTLYLKGF-FDKTFLHSLNTNGTKLPSHPDRNLTPGIDVTTGSLGQGISIATGIAYA 135 

LYS L+L G+ L + G+K P HP+ N T G++ TTG LGQGI+ A G+A A 

Sbjct: 142 MLYSLLHLAGYDLSVEDLKNFRQWGSKTPGHPEVNHTDGVEATTGPLGQGIANAVGMaMA 201 

Query: 136 QK IENSSYYTYTrVGDGELNEGQCWEAIQFAAHHQLHHLIVFVDDNKKQL 185 

+ + +YT+ + GDG+L EG EA A H +L L++ D N L 

Sbjct: 202 EAHLAAKFNKPGFDIvDHYTFALNGDGDLMEGVSQEAAS^GHLKLGKLVLLYDSNDISL 261 
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Query: 186 DGLTADICNPGDFVAKFEAFC-FDAWVK-GDDIEAIDKAIKTFQDSNSVEPKCIVLDSIK 244 

DG T+ + D +FEA+G+ + VK G+D+E I AI+ + + + +P I + +1 
Sbjct: 262 DGPTS-MAFTEDVKGRFEAYGWQHILVKDG^IDLEEIAAAIEAAK-AETEKPTIIEVKTII 319 

Query: 245 GQGVKE 250 

G G ++ 
Sbjct: 320 GFGAEK 325 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 156 

A DNA sequence (GBSx0162) was identified in S.agalactiae <SEQ ID 521 > which encodes the amino acid 
sequence <SEQ ID 522>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.27 Transmembrane 53 - 69 ! 53 - 69) 

Final Results 

bacterial membrane Certainty=0 . 1107 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9499> which encodes amino acid sequence <SEQ ID 950O 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB98674 GB:U67515 transketolase ' ' [Methanococcus jannaschii] 
Identities = 100/301 (33%) , Positives = 171/301 (56%) , Gaps = 7/301 (2%) 

KEMRLVYRDFLLQANQENKQITVLEADLSSSMSTNALASEFGKRYINLGIMEAEMVGLAA 6 5 
K MR Y + L++ ++ + + VL+ADLS ST A EF +R+ N G+ E M+G+AA 
KGMRKGYGETLIELGKKYENLVVLDADLSGSTQTAMFAKEFPERFFNAGVAEQNMIGMAA 68 

GLAIKGYKPYLHTFGPFASRRVFDQVFLSLGYSQLSATI IGSDAGISAEMNGGTHMPFEE 125 
GLA G + +F FAS R ++ + + Y +L+ 1+ + AGI+ +G +H E+ 
GLATTGKIVFASSFSMFASGPAWEIIRNLVAYPKLNVKIVATHAGITVGEDGASHQMCED 128 

FEVSDDIQFEAILKQTLSIDGLKYIRTIRKAPTAVYEGRE DFSK 181 





6 


Sbjct: 


9 


Query: 


66 


Sbjct: 


69 




126 


Sb j ct : 


129 




182 


Sbjct: 


189 




242 


Sbjct: 


247 



GIL G+D+T++A+G V A+ A + LKE GI A ++++ IKP+ EE+ D 
GKI-LVDGEDLTIIATGEEVPEALRAGEIIjKENC-ISAEIVEMATIKPIDEEIIKKSKD-F 246 

IVTIENHNRIGGIGSALCEWL-SHEKDTTVSRMGIDERFGQVGQ^YLLEEYGLAVKDIVQ 301 
+VT+E+H+ IGG+G A+ E + S + + R+GI++ FG+ G+ + LL+ YGL + I + 
VVTVEDHSIIGGLGGAVAEVIASNGLNKKLLRIGIMDVFGRSGKADELLKYYGLDGESIAK 3 07 

There is also homology to SEQ ID 520. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 157 

A DNA sequence (GBSx0163) was identified in S.agalactiae <SEQ ID 523> which encodes the amino acid 
sequence <SEQ ID 524>. Analysis of this protein sequence reveals the following: 



WO 02/34771 



PCT/GB01/04789 



Possible site: 24 

>» Seems to have no N-terminal signal sequence 



- Final Results 

bacterial cytoplasm - 
bacterial membrane - 
bacterial outside - 



• Certainty=0 . 2517 (Affirmative 

• Certainty=0. 0000 (Not Clear) ■ 
- Certainty=0. 0000 (Not Clear) 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 158 

A DNA sequence (GBSx0164) was identified in S.agalactiae <SEQ ID 525> which encodes the amino acid 
sequence <SEQ ID 526>. Analysis of this protein sequence reveals the following: 
Possible site: 35 

»> Seems to have no N-terminal signal sequence 
Likelihood = -6.42 
Likelihood = -5.10 
Likelihood = -4.30 
INTEGRAL Likelihood = -3.66 



119 



Transmembrane 
Transmembrane 
Transmembrane 



135 ( 114 - 145) 

33 - 49 ( 32 - 50) 

94 - 110 ( 94 - 111) 

67 - 83 ( 60 - 83) 



■ Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



- Certainty=0. 3569 (Affirmative) . 

- Certainty-0. 0000 (Not Clear) < i 

- Certainty=0 . 0000 (Not Clear) < i 



No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8503> and protein <SEQ ID 8504> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 4 
SRCFLG : 0 

MCG: Length of UR: 22 

Peak Value of UR: 2.96 
Net Charge of CR: 2 
McG: Discrim Score: 10.55 
GvH: Signal Score (-7.5): -4.31 

Possible site: 22 
»> Seems to have an uncleavable N-term signal seq 



Amino Acid Composition: calculated 












ALOM program 


count: 6 value: -6. 


42 threshold: 


0.0 








INTEGRAL 


Likelihood = -6.42 


Transmembrane 


154 


170 


149 


180 


INTEGRAL 


Likelihood = -5.10 


Transmembrane 


68 




57 


85 


INTEGRAL 


Likelihood = -5.04 


Transmembrane 


6 


22 


2 


24 


INTEGRAL 


Likelihood = -4.30 


Transmembrane 


129 


145 


129 


146 


INTEGRAL 


Likelihood = -3.66 


Transmembrane 


102 




95 


118 


INTEGRAL 


Likelihood = -3.56 


Transmembrane 


29 


45 


29 


46 


PERIPHERAL 


Likelihood = 0.79 


285 











Final Results 

55 bacterial membrane Certainty=0 .3569 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

ORP01868O91 - 1575 of 1938) 

GP|9946413 |gb | AAG03934 . 1 1 AE004491_1 |AE004491 (5 - 434 of 434) hypothetical protein 
{Pseudomonas aeruginosa} 
5 %Match =8.1 

%Identity =26.1 %Similarity =48.6 

Matches = 105 Mismatches = 192 Conservative Sub.s = 91 

171 201 231 261 291 321 351 381 

10 DTTVSRMGIDERFGQVGQMEYLLEEYGLAVKDIVQHCKSIYKS*QKGNIGVAFLLFSEIFKFCISILWYFILTKKKGWV 

M 



411 441 471 480 507 537 567 597 

15 MRAWKGIVLILSS I WTLVAWQNAGLSEFW PGLALTSL-SLTFLLSTKFRILESYFQGIENMYFYHKVMAVF 

I = I:: =11 I I =11 1=1 :| II 11= s || : |s: || || :: 

KLLWGVIAAALAAWGLTIAVDPPASLDIWVWKQAILLTGV^^ 

20 30 40 50 60 70 80 

20 627 657 687 717 747 777 

SMILLLLHKIGLGQGGHGSEF AKTIGSAGLYLFLSIVFVAYFGNFLKYEIWRFIHRFVYL 

:::| III : I : II =1 = = = :::| : : I :||>:|: : I 

AIVLGLLHYLLEIAGPWIAGIVGKPVKGPRWTFLDVFRGSAKELGEWSAWILGGMLLWLW-QRFPYHLTOYVHKALAL 
100 110 120 130 140 150 160 

25 

807 837 867 897 924 951 981 1011 

AYILGLVHTFMILGDRILGNTLLSLIVLGYAVIGVISGFYIIFLYSRM-RFRR-VGYVQKVTHLNHDTTEIEIAMKRPYR 

|:= |: ::| : :| |::| == I 1= I II I I I 1= == s 

VYLVLAFHS - WLAPASYWSQPAGWLVAACALLGSACA- - LLSLSGRIGRTRRHAGVVTAVBRHGESLLEVTCRLQGDWS 
30 170 180 190 200 210 220 230 



1041 1071 1101 1125 1155 1185 1215 1242 

YDYGQFTFFKI YQAGFESAAHPFS I SGGHDRV- - I FLTVKASGDYTKS I YKQLKVGTKIALDRAYGHMLFDKD - KKEQVW 
= III I 1111=1= = ===!! 1111= = 1=11 == == II I = III 



250 260 270 280 290 300 310 

1272 1293 1323 1353 1383 1413 1443 1461 

IAGGIGITPFISFI - - -RENSILTKRVDFFYTFSNQDNLIYQDMLBSYAKANPNFKLHLKWSSLKGRLDFSQ SVFE 

=1 111=1111=== = l== I I == I = 1= 11= I =1= =1 I 

VAAGIGVTPFIAWLESLQAAPESAPSVELHYCVKNSQEALFAGRLRELCEHLPSVTLHIRYSDEQGKPQAAQLGVLKSAE 
330 340 350 360 370 380 390 

1488 1518 1548 1575 1605 1635 1655 1695 

GQ-PTIFMCGPTSMTSTYAKVFRQKDAKSRLVY-EGFSFRDSWLSIFLLKTFDKVYSNLIK*EGL*DKPTFSWF*ECQS* 
1= l=== III = = = =l== II = I I I 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 159 

A DNA sequence (GBSx0165) was identified in S.agalactiae <SEQ ID 527> which encodes the amino acid 
sequence <SEQ ID 528>. This protein is predicted to be 30S ribosomal protein S15 (rpsO). Analysis of this 
protein sequence reveals the following: 

Possible site: 24 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4074 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB13541 GB:Z99112 ribosomal protein SIS (BS18) [Bacillus subtilis] 
Identities = 55/89 (61%), Positives = 71/89 (78%) 

Query: 1 MA.ISKEKKNEIIAQYARHEGDTGSV3VQVAVLTWEIOTLNDHIKQHKKDHATYRGLMKKI 60 

MAI++E+KN++I ++ HE DTGS EVQ+A+LT IN+LN+H++ HKKDH + RGL+K + 
Sbjct: 1 r^ITQERKNQLINEFKTHESDTGSPEVQIAILTDSINKMIEHmTHKKDHHSRRGLLKIW 60 

Query: 61 GHRRNLLAYLRRTDVNRYRELIQSLGLRR 89 

G RRNLL YLR DV RYRELI LGLRR 
Sbjct: 61 GKRRNLLTYLRNKDVTRYRELINKLGLRR 89 

A related DNA sequence was identified in S.pyogenes <SEQ ID 529> which encodes the amino acid 
sequence <SEQ ID 530>. Analysis of this protein sequence reveals the following: 
Possible site: 41 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3 746 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 88/89 (98%) , Positives = 88/89 (98%) 
Sbjct: 1 ^ISKEKKNEIIAQYARHEGDTGSVEVQVAVLTTOINHLNSHIKQHKKDHATYRGLMKKI t 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 160 

A DNA sequence (GBSx0166) was identified in S.agalactiae <SEQ ID 531 > which encodes the amino acid 
sequence <SEQ ID 532>. This protein is predicted to be polyribonucleotide nucleotidyltransferase (pnp). 
Analysis of this protein sequence reveals the following: 
Possible site: 46 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.64 Transmembrane 448 - 464 ( 448 - 464) 

Final Results 

bacterial membrane Certainty=0. 1256 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9501> which encodes amino acid sequence <SEQ ID 9502> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC43595 GB:U29668 polynucleotide phosphorylase [Bacillus subtilis] 
Identities = 428/694 (61%), Positives = 532/694 (75%), Gaps = 4/694 (0%) 

Query: 7 KQVFEMIFAGKKLvvETGQVAKQi^GSVVVRYGDSTvLTAATOSKKMSTGDFFPLQVNYE 66 
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Sbjct: 

Query: 67 EKMYAAGKFPGGFNKREGRPSTDATLTARLIDRPIRPMFAEGFRNEVQVINTVLSFDENA 126 

E++YA GK PGGF KREGRPS A L +RLIDRPIRP+FA+GFRNEVQVI + V+S D+N 
Sbjct: 65 ERLYAVGKIPGGFIKREGRPSEKAVLASRLIDRPIRPLFADGFRNEVQVISIVMSVDQNC 124 

Query: 127 SAPMAAMFGSSIiALSISDIPFNGPIAGVQVAYl'DGNFIINPTAQEQEASALELTVAGTKE 186 

S+ MAAMFGSSLALS+SDIPF GPIAGV V +D FIINPT + E S + L VAGTK+ 
Sbjct: 125 SSEMAAMFGSSI^SVSDIPFEGPIAGVTVGRIDDQFIINPTVDQLEKSDINLVVAGTKD 184 



Sbjct: 185 AINMVEAGADEVPEEIMEEAIMFGHEEIKRIiIAFQEEIVAAVGKEKSEIKLFEIDEEIjNE 244 

Query: 247 EIIATHNIALQAAVQVEEKKAREAATEAVKEWIGEYEARYAEHEEYDRIMRDVAEILEQ 306 

++ A L A+QV EK ARE A VK V+ ++E EH+E ++ V +IL + 

Sbjct: 245 KVKALAEEDLLKAIQVF.EKHAREDAINEVKNAVVAKFEDE--EHDE--DTIKQVKQILSK 300 

Query: 307 MEHAEVRRLITEDKIRPDGRRVDEIRPLDAEIDFLPQVHGSGLFTRGQTQALSVLTLAPM 366 

+ EVRRLITE+K+RPDGR VD+IRPL +E+ LP+ HGSGLFTRGQTQALSV TL + 
Sbjct: 301 LVKNEVRRLITEEKVRPDGRGVDQIRPLSSEVGLLPRTHGSGLFTRGQTQALSVCTLGAL 360 

Query: 367 GEAQIIDGLTPEYKKRFMHHYNFPQYSVGETGRYGAAGRREIGHGALGERALEQVLPRLE 426 

G+ QI+DGL E KRFMHHYNFPQ+SVGETG GRREIGHGALGERALE V+P + 

Sbjct: 361 GDVQILDGLGVEESKRFMHHYNFPQFSVGETGPMRGPGRREIGHGALGERALEPVIPSEK 420 

Query: 427 EFPYAIRLVAEVLESNGSSSQASICAGTIALMAGGVPIKAPVAGIAMGLISDGTNYTVLT 486 

+FPY +RLV+EVLESNGS+SQASICA T1A+M GVPIKAPVAGIAMGL+ G +YTVLT 
Sbjct: 421 DFPYTTOLVSEVLESNGSTSQASIC^TIJ^DAGVPIXAPVAGIAMGLVKSGEHYTVLT 480 

Query: 487 DIQGLEDHFGDMDFKVAGTREGITAWmiKIEGITPQILEEALAQAKKARFEILDVLHG 546 

DIQG+ED GDMDFKVAGT +G+TALQMDIKIEG++ +ILEEAL QAKK R EIL+ + 
Sbjct: 481 DIQGMEDALGDMDFKVAGTEKGWALQMDIKIEGLSREILEEALQQAKKGRMEILNSMLA 540 

Query: 547 AIAEPRPQLAPTAPKIDMIKIDVDKIKWIGKGGETIDKI IAETGVKIDIDEEGNVS I FS 606 

++E R +L+ APK1 + 1+ DKI+ VIG G+ I+KII ETGVKIDI+++G + I S 
Sbjct: 541 TLSESRKELSRYAPKILTMTINPDKIRDVIGPSGKQINKIIEETGVKIDIEQDGTIFISS 600 

Query: 607 SDQAAIDRTKDIIASLWFAK37G3WHAKVVRIEKFGAFVNLFDKTDALVHISEIAWTRT 666 

+D++ + K II LVRE +VG++Y KV RIEKFGAFV +F D LVHISE+A R 
Sbjct: 601 TDESGNQKAKKIIEDLTOEVEVGQLYK3KVKRIEKFGAFVEIFSGKDGLVHISELALERV 660 

Query: 667 ANVADVLEIGEEVDVKVIKIDDKGRVDASMKALL 700 

V DV++IG+E+ VKV +ID +GRV+ S KA+L 
Sbjct: 661 GKVEDWKIGDEI LVKVTE I DKQGRVNLSRKAVL 694 

A related DNA sequence was identified in S.pyogenes <SEQ ID 533> -which encodes the amino acid 
sequence <SEQ ID 534>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

»> Seems to have no N- terminal signal sequence 

Likelihood = -0.64 Transmembrane 444 - 460 ( 444 - 460) 



Final Results 

55 bacterial membrane Certainty=0. 1256 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

60 Identities = 631/708 (89%) , Positives = 664/708 (93%) , Gaps = 2/708 (0%) 

Query: 5 MSKQVFEMIFAGKKLVVETGQVAKQANGSWWYGDSTVIjTAAVMSKKMSTGDFFPLQVN 64 

MSKQ F FAGK LWE GQVAKQANG+ WRYGDSTVLTAAVMSKKM+TGDFFPLQVN 
Sbjct: 1 MSKQTFTTTFAGKPLVVEVGQVAKQAHGATV\^YGDSTOLTAAvMSKKMATGDFFPLQVN 60 

65 
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Query: 65 YEEKMYAAGKFPGGFNKREGRPSTDATLTAR1.IDRPIRPMFAEGFRNEVQVINTVLSFDE 124 

YEEKMYAAGKFPGGF KREGRPSTDATLTARLIDRPIRPMFAEGFRNEVQVINTVLS+DE 
Sbjct: 61 YEEKMYAAGKFPGGFMKREGRPSTDATLTARLIDRPIRPMFAEGFRNEVQVINTV1SYDE 120 

Query: 125 NASAPMAAMFGSSLALSISDIPFNGPIAGVQVAYVEGNFIIHPTAQEQEASALELTVAGT 184 

NASAPMAAMFGSSLALSISDIPFNGPIAGVQV Y+DG FIINP ++ EAS LELTVAG+ 
Sbjct: 121 NASAPMAAMFGSSLALSISDIPFNGPIAGVQVGYIDGEFIIKPDKEQMEASLLELTVAGS 180 

Query: 185 KEAINMVESGAKELSEEIMLEALLKGHEAVCELIAFQEEIVTAIGKEKAEVELLQVDPEL 244 

KEAINMVESGAKELSE+ IMLEALLKGH+A+ ELIAFQE+IV +GKEKAEVELLQVD +L 
Sbjct: 181 KEAINMVESGAKELSEDIMLEALLKGHQAIQELIAFQEQIVAWGKEKAEVELLQVDVDL 240 

Query: 245 QAEIIATHNIALQAAVQVEEKKAREAATEAVKEWIGEYEARYAEHEEYDRIMRDVAEIL 304 

QA+I+A +N LQ AVQVEEKKAREAATEAVKE+V EYE RYAE E IMRDVAEIL 
Sbjct: 241 QADIVAKYNAQLQKAVQVEEKKARFAATE^VKEMVKAEYEERYAEDENLATIMRDVAEIL 300 

Query: 305 EQMEHAEVRRLITEDKIRPDGRRVDEIRPLDAEIDFLPQVHGSGLFTRGQTQALSVLTLA 364 

EQMEHAEVRRLITEDKIRPDGR++DEIRPLDA +DFLP+VHGSGLFTRGQTQALSVLTTA 
Sbjct: 301 EQMEHAEVRRLITEDKIRPDGRKIDEIRPLDAWDFLPKVHGSGLFTRGQTQALSVLTLA 360 

Query: 365 PMGEAQIIDGLTPEYICKRFMHHYNFPQYSVGETGRYGAAGRREIGHGALGERALEQVLPR 424 

PMGE QIIDGL PEYKKRF+HHYNFPQYSVGETGRYGAAGRREIGHGALGERALEQVLP 
Sbjct: 361 PMGETQIIDGLAPEYKKRFLHHYNFPQYSVGETGRYGAAGRREIGHGALGERALEQVLPS 420 

Query: 425 LEEFPYAIRLVAETOESNGSSSQASICAGTLALMAGGVPIKAPVAGIAMGLISDGTNYTV 484 

LEEFPYAIRLVAEVLESNGSSSQASICAGTIALKAGGVPIKAPVAGIAMGLISDGTNYTV 
Sbjct: 421 LEEFPYAIRLVAEVLESNGSSSQASICAGTLALMAGGVPIKAPVAGIAMGLISDGTMYTV 480 

Query: 485 LTDIQGLEDHFGDMDFKVAGTREGITALQtlDIKIEGITPQILEEALAQAKKARFEILDVL 544 

LTDIQGLEDHFGDMDFKVAGTREGITALQMDIKI GITPQILEEALAQAKKARFEILDV+ 
Sbjct: 481 LTDIQGLEDHFGDMDFICVAGTREGITALQMDIKIAGITPQILEEALAQAKKARFEILDVI 540 

Query: 545 HGAIAEPRPQLAPTAPKIDMIKIDVDKIKWIGKGGETIDKIIAETGVKIDIDEEGNVSI 604 

IAEPRP+LAPTAPKID IKIDVDKIKWIGKGGETIDKIIAETGVKIDID+EGNVSI 
Sbjct: 541 EATIAEPRPELAPTAPKIDTIKIDVDKIKWIGKGGETIDKIIAETGVKIDIDDEGNVSI 600 

Query: 605 FSSDQAAIDRTKDIIASLWFAKVGEVYHAKVVRIEKFGAFVNLFDKTDALVHISEIAWT 664 

+SSDQAAIDRTK+IIA LVREAKVGEVYHAKWRIEKFGAFVNLFDKTDALVHISEIAWT 
Sbjct: 601 YSSDQAaiDRTKEIIAGLTOEaKVGEVYHAKVVRIEKFGAFVNLFDKTDALVHISEIAWT 660 

Query: 665 RTANVADVLEIGEEVDVKVIKIDDKGRVDASMKALLPRPPKADNPKKE 712 
RT NV+DVLE+GE+VDVKVIKID+KGRVDASMKAL+PRPPK + KKE 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 161 

A DNA sequence (GBSx0167) was identified in S.agalactiae <SEQ ID 535> which encodes the amino acid 
sequence <SEQ ID 536>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1293 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 537> which encodes the amino acid 
sequence <SEQ ID 538>. Analysis of this protein sequence reveals the following: 
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Possible site: 38 



Pinal Results 

bacterial membrane — Certainty=0 . 1171 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below: 

Identities = 172/248 (69%) , Positives = 211/248 (84%) 

Query: 1 MTSTNELDIRLRAFINAPDNFLDSIGLVNALHHSTVWASKEPYAIQVDGQEWPVFTDIT 60 

MT +NELDIRLRAFINAPDNFLDS+ LVNA H+ VWA+KEPY I+V+G +V PVFTD 
Sbjct: 1 MTKSNELDIRLRAFINAPDNFLDSLALVNAFHNFPvWAAKEPYVIEVEGVKVTPVFTDKE 60 

Query: 61 DLNHFKEEQESARDMFWESRRSLDVLDiAISHGIAGLVYNLKKEGDFGNSTIFYCEDMVQ 120 

D+ FKEEQ+SA+ +W R +L VL+E 1+ G AGL++NLKK+GDFGNSTIF DM+Q 
Sbjct: 61 DMARFKEEQKSAQSQYWLERSALAVLEEVITSGAAGLIFNLKKKGDFGNSTIFKSSDMIQ 120 

Query: 121 FMNNYTTILNQLLNEDNIVADIMDKTYLVPAFVHPREEGSFDRLFPTMSTPEGKSYVPVF 180 

FMN+YTT+LN L+++DN+ AD M+K YLVPAFV+P++ +DRLFPTMSTPEGKSYVP F 
Sbjct: 121 F^HYTTVLNTLMSDDNVAADTMEKVYLVPAFVYPKDNNHYDRLFPTMSTPEGKSYVPAF 180 

Query: 181 SNLLSFEKWYNHNDFGGAFRKAQGVIIAWTIDDIYKPRNGENEIDDTFGVAINPFDEQQV 240 

SNL SF KWYN +DFGG FRKA+GVIL WTIDDIY+PRNGENE+D+TFGVAINPFD+QQ+ 
Sbjct: 181 SNLQSFAKI\rYNQDDFGGLFRKAEGVILTWTIDDIYQPRNGENELDETFGVAINPFDDQQI 240 

Query: 241 LVDWSDVE 248 

LVDWS+++ 
Sbjct: 241 LVDWSELD 248 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 162 

A DNA sequence (GBSx0168) was identified in S.agalactiae <SEQ ID 539> which encodes the amino acid 
sequence <SEQ ID 540>. This protein is predicted to be serine acetyltransferase (cysE). Analysis of this 
protein sequence reveals the following: 

Possible site: 39 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.02 Transmembrane 150 - 166 ( 147 - 168) 



Final Results 

bacterial membrane Certainty=0 . 1808 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9503> which encodes amino acid sequence <SEQ ID 9504> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB71304 GB:AJ130879 serine acetyltransferase [Clostridium 
sticklandii] 

Identities = 92/169 (54%) , Positives = 125/169 (73%) 
Query: 9 ICESIAIVKEQDPAARSSLEVILTYPGIKAIAAHRLSHFLWNHNFKLLARMHSQFWRFWTQ 68 
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KE+I + +E+DPAA+ ++ +4+ PGI A+ HR++H L+N +AR+ SQ RF T 

Sbjct: 20 KETIEVaREKDPAAKGAINILWrPGIHAIMFHRVAHSLYlTOKHFFIARLISQISRFLTG 79 

Query: 69 IEIHPC^TISEGVFIDHGSGLVIGETAIVEKGAMLraGVTLGGTGKDKGKRHPTIRKGAL 128 

IEIHPGA I FIDHG G+VIGETA + ML+H VTLGGTGKDKGKRHPT+ + 
Sbjct: 80 IErHPGAQIGRRFFIDHGMGWIGETAEIGDDVMLFHQVTLGGTGKDKGKRHPrVENNVI 139 

Query: 129 ISAHSQIIGPIEVGENAKVGAAAVVLADVPADVTWGVPAKVVRVHGQK 177 

ISA +++GPI +GEN+K+GA AWL D+P + T VG+PAKWR++G+K 
Sbjct: 140 ISAGVKVLGPIVIGENSKIGANAVVLHDIPKKATAVGIPAKVVRLNGEK 188 

A related DNA sequence was identified in S. pyogenes <SEQ ID 54 1> which encodes the e 
sequence <SEQ ID 542>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0141 (Affirmative) < succ> 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 162/193 (83%) , Positives = 178/193 (91%) 

Query: 5 MGWWKESIAIVKEQDPAARSSLEVILraPGIKALAAHRLSHFLWNHNFKLLARMHSQFWR 64 

MGWWKESIAIVK DPAAR+SLEVILTYPGIKALAAHRLSHFLW H+FKLLARMHSQFWR 
Sbjct: 1 MGWWKESIAIVKALDPAARNSLEVILTYPGIKAIAAHRLSHFLWRHHFKLLARMHSQFWR 60 

Query: 65 FWTQIEIHPGATISEGVFIDHGSGLVIGETAIVEKGAMLYHGVTLGGTGKDKGKRHPTIR 124 

FWTQIEIHPGA 1+ GVFIDHG+GLVIGETAIVEKG MLYHGVTLGGTGKD GKRHPT+R 
Sbjct: 61 FWTQIEIHPGAQIAPGVFIDHGAGLVIGETArVEKGVMLYHGVTLGGTGKDCGKRHPTVR 120 

Query: 125 KGALISAHSQIIGPIEVGENAKVGAi^VVljADVPADVTWGVPAKVWVHGQKDDLQIRS 184 

+GALISAH+Q+IGPI++G NAKVGAAA.WL+DVP DVTWGVPAK+VRVHGQKD+ Ql+S 
Sbjct: 121 QGALISAHAQVIGPIDIGANAIWGAAAVVLSDVPEDVTWGVPAKIVRVHGQKDNRQIQS 180 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 163 

A DNA sequence (GBSx0169) was identified in S.agalactiae <SEQ ID 543> which encodes the amino acid 
sequence <SEQ ID 544>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>» May be a lipoprotein 

INTEGRAL Likelihood = -5.89 Transmembrane 32 - 48 ( 29 - 49) 

Final Results 

bacterial membrane Certainty=0. 3357 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 164 

A DNA sequence (GBSx0170) was identified in S.agalactiae <SEQ ID 545> which encodes the amino acid 
sequence <SEQ ID 546>. This protein is predicted to be cysteinyl-tRNA synthetase (cysS). Analysis of this 
protein sequence reveals the following: 

:erminal signal sequence 



10 Final Results 

bacterial cytoplasm Certainty=0. 2227 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0 0 0 0 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB11870 GB:Z99104 cysteinyl-tRNA synthetase [Bacillus subtilis] 
Identities = 246/455 (52%) , Positives = 322/465 (68%) , Gaps = 23/465 (4%) 

IKIYDTMTRSLQDFIPLNEGKVNMYVCGPTVYNYIHIGNARSWAFDTIRRYFEYCGYQV 61 
I +Y+T+TR + F+PL EGKV MYVCGPTVYNYIHIGNAR + +DT+R Y EY GY V 
ITLYNTLTRQKETFVPLEEGKVKMWCGPTVYNYIHIGNMPAIVYDTVRNYLEYKGYDV 62 



Y+SNFTDVDDK+IK A E G D + S + + FI A+ EDV ALG + A +PRV++ MD I 



I+FV LV K +AYE+ GDVYF+ Y KL+ +++++L GA 





2 


Sbjct: 


3 




62 


Sbjct: 


63 




122 


Sbjct: 


123 




182 


Sbjct: 


180 


Query: 


242 


Sbjct: 


240 


Query: 


302 


Sbjct: 


300 




351 


Sbjct: 


360 




402 


Sbj ct: 


420 



DFALWK+AK GE+SW+SPWGKGRPGWKIECS M + LGD IDIH GG DL FPHH NEI 



AQSEA TGKTFA YW4HNG++N+DNEKMSKSLGNF+ VHD++K 



ilTVIFEMAKWIN SGHYTSRVKETFAELLEIFGI-VFQEEVLDAD 401 

I+V+F++AK N +H + EF++ + G+ ++E+LD + 

DDFNTANAISvXjFDIiAKimNYYLQKDHTADHVITAFIEMFDRIVSVLGFSLGEQELLDQE 419 

IESLIEQRQEARANRDFATADRIRDELAKQGIKLLDTKDGVROTR 446 
IE LIE+R EAR NRDFA +D+IRD+L I L DT GRUB 
IEDLIEKRNEARRNRDFALSDQIRDQLKSMNIILEDTAQGTRWKR 464 

A related DNA sequence was identified in S.pyogenes <SEQ ID 547> which encodes the amino acid 
sequence <SEQ ID 548>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1765 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below: 

Identities = 357/447 (79%) , Positives = 401/447 (88%) 

Query: 1 MIKIYDTMTRSLQDFIPLNEGronMWOTPTVYNYIHIGNARSWAFDTIRRYFEYCGYQ 60 

MIKIYDTMTRSL+ F+PB E W+YVCGPTVYNYIHIGNARS VAFDTIRRYFEY GYQ 
Sbjot: 1 MIKIYDTMTRSLRKFVPLTEOTWIWCX3PTVYMYIHIGNARSAVAFDTIRRYFEYTGYQ SO 

Query: 61 VNYISNFTDVDDKIIKGAAEAGMDTKSFSDKFISAFMEDVAALGVKPATKNPRVIDYMDE 120 

VNYISNFTDVDDKIIK A +AG+ K SD+FI+AF+ED ALGVKPAT+NPRV+DY+ E 
Sbjct: 61 VNYISNFTDVDDKIIKAATQAGVSPKELSDRFIAAFIEDTKALGVKPATQNPRVMDYIAE 120 

Query: 121 IIDWK^VDKEFAYFJOTGDVYFRVSKSHHYAKLANKTLEDLEIGASGRVDGEGEIKENP 180 

II FV4 L++K+FAYEA4GDVYFRV ICS HYAKLANKTL +LE+GASGR D E +KENP 
Sbjct: 121 IISFVESLIEKDFAYEADGDVYFRVEKSEHYAKIANKTLSELEVGASGRTDAETALKENP 180 

Query: 181 LDFALWKSAKSGEVSWESPWGKGRPGWHIECSVI>lATEILGDTIDIHGGGADLiEFPHHTKE 240 

LDFALWKSAK+GEVSW+SPWG GRPGWHIECSVMATEILGDTIDIHGGGADLEFPHHTNE 
Sbjct: 181 LDFALWKSAKAGEVSWDSPWGFGRPGWHIECSVMATEILGDTIDIHGGGADLEFPHHTNE 240 

Query: 241 IAQSEAKTGKTFAIOTmHNGFVNVD^KMSKSLGNFITVHDMLKSVDGQVIRFFLATQQY 300 

IAQSEAKTGKTFANYWMHNGFV VDNEKMSKSLGNF+TVHDML++VDGQV+RFFLATQQY 
Sbjct: 241 IAQSEAKTGKTFAKTYWMHNGFVTVDNEKMSKSI^^ 3 00 

Query: 301 RKPWFTEKAVHDAEVNLKYLKlOTFNLPIQENAlSroEELEQFVKAFQGAMDDDFNTANGIT 360 

RKP+NFTEK +HDAE+NLKYLKNT P+ E A+++EL+QFV AFQ AMDDDFNTANGIT 
Sbjct: 301 RKPINFTEKTIHDAEINLKYLKNTLQQPLTETADEQELKQFVIAFQDAMDDDENTANGIT 3S0 

Query: 361 VIFEMAKWINSGHYTSRVKETFAELLEIFGIVFQEEVLDADIESLIEQRQEARANRDFAT 420 

V+F+MAKWINSG YT VK F ++L +FGI + F+EEVL+ DIE+LI +RQEARANRDFAT 
Sbjct: 361 VVFDMAKWINSGSYTEPVKSAFEICM»VFGIIFEEEVLEVDIEALIAKKQEARANMFAT 420 

Query: 421 ADRIRDELAKQGIKLLDTKDGVRWTRD 447 

AD IRD+LA QGIKLLDTKDGVRW RD 
Sbjct: 421 ADAIRDQLAVQGIKLLDTKDGVRWLRD 447 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 165 

A DNA sequence (GBSx0171) was identified in S.agalactiae <SEQ ID 549> which encodes the amino acid 
sequence <SEQ ID 550>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0259 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9505> which encodes amino acid sequence <SEQ ID 9506> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

[Bacillus subtilis] 

Query: 3 DVRLINGIALAFEGDAVYSLYIRRHLIMQGFTKPNQLHRKATQYVSANAQALLINAMLEE 62 

D + +NG+ALA+ GDA++ +Y+R HL+ QGFTKPN LH+K+++ VSA +QA ++ + + 
Sbjct: 9 DSKQMGLAIAYIGDAIFEVmiHffl^K^ 68 

Query: 63 NILTDEEQLIYKRGRNANSHTKAKNADIITYRMSTGFEALMGYLDMTGQIKRLETLIQWC 122 
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+ T+EE+ + KRGRNA ST KN D+ TYR ST FEAL+GYL + + +RL L+ 
Sbjct: 69 SFFTEEEEAVLKRGRNAKSGTTPKNTDVQTYRYSTAFEALLGYLFLEKKEERLSQLVAEA 128 

Query: 123 IE 124 

5 i+ 

Sbjct: 129 IQ 13D 

A related DNA sequence was identified in S.pyogenes <SEQ ID 55 1> which encodes the amino acid 
sequence <SEQ ID 552>. Analysis of this protein sequence reveals the following: 

10 Possible site: 56 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 99/127 (77%) , Positives = 111/127 (86%) 

20 

Query: 2 IDVRLINGIALAFEGDAVYSLYIRRHLIMQGFTKPNQLHRKATQYVSANAQALLINAMLE 61 

+DV LINGIALAFEGDAVYS Y+RRHLI QG TKP+QLHR AT+YVSA AQA LI AMLE 
Sbjct: 5 VDvNLINGIALAFEGDAWSYYVPJRHIiIFQGKTKPSQLHRLATRYVSAKAQANLIQAMLE 64 

25 Query: 62 ENILTDEEQLIYKRGRNANSHTKAKNADIITYRMSTGFEALMGYLDMTGQIKRLETLIQW 121 

+LT++E+ IYKRGRN NSHTKAKNADIITYRMSTGFEA+MGYLDM GQ +RLE LI+W 
Sbjct: 65 AQLLTEKEEDIYKRGRl^TNSHTKAKNADIITYRMSTGFEAIMGYLDMMGQKERLEELIRW 124 

Query: 122 CIETIEK 128 
30 CIE +EK 

Sbjct: 125 CIEYVEK 131 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

35 Example 166 

A DNA sequence (GBSx0172) was identified in S.agalactiae <SEQ ID 553> which encodes the amino acid 
sequence <SEQ ID 554>. This protein is predicted to be spoU rRNA methylase family protein. Analysis of 
this protein sequence reveals the following: 

Possible site: 30 
40 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1478 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB11872 GB:Z99104 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 113/244 (46%) , Positives = 163/244 (66%) , Gaps = 6/244 (2%) 

50 

Query: 11 ESSDLVYGLHA VTESLRANTG - NKLYLQDDLRGKNVDKVKALATEKKVS I SWTPKKTLSD 69 

+ D V G +AV E+L+++ KL++ ++ +V LA ++ ++I + P+K L 

Sbjct: 3 QQHDYVTGKNAVIETLKSDRKLYKIjWMAEbmra^ 62 

55 Query: 70 MTNGGVHC^FVLKVSEFAYADIiSEIMTKAENE-ENPLILILDGLTDPHNLGSILRTADAT 128 

M G HQG V +V+ + YA+L ++ AE + E P LILD L DPHNLGS I +RTADA 
Sbjct: 63 ^OT , GQ-HQGVVAQVAAYEYAELDDLYKA^EKNEQPFFLILDELEDPHNLGSIMRTADAV 121 
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Query: 129 NVTGI 1 1 PKHRSVGVTP WSKTSTGAVEHVP IARVTNLSQTLDTLKDKEFWIFGTDMNGT 188 

GI+IPK R+VG+T V+K STGA+EH+P+ARVTOL++TL+ +K++ W+ GTD + 
Sbjct: 122 GAHGIVIPKRRAVGLTTTVAKASTGAIEHIPVARVTNLARTLEEMKERGIWWGTDASAR 181 

Query: 189 PSHKWTKGK--LALVIGNEGKGISHNIKKQVDEMITIP^GHVQSlimSVAAAILMYEV 246 

+ N G LALVIG+EGKG+ +K++ D +1 +PM G V SLNASVAA +LMYEV 
Sbjct: 182 EDFR-NMDGMVIPLALVIGSEGKGMGRLVKEKCDFLIKLPMAGKVTSLNASVAAGLIjMYEV 240 

Query: 247 PRNR 250 
+R R 

Sbjct: 241 YRKR 244 

A related DNA sequence was identified in S.pyogenes <SEQ ID 555> which encodes the amino acid 
sequence <SEQ ID 556>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .1037 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 205/248 (83%) , Positives = 225/248 (90%) , Gaps = 1/248 (0%) 

MKDKQFKEESSDLWGLHAVTESLRANTGNKI,YLQDDLRGKNVDKVKA1ATEKKVSISWT 62 
M+DK E++D+VYG+HAVTESL+ANTGNKLY+Q+DLRGK VD +K+LAT+KKV+ISWT 
MEDKD - TIETNDIVYGVHAVTESLQANTGNKLYIQEDLRGKKVDNIKSLATQKKVAI SWT 6 8 



RTADATNV G+ 1 1 PKHRSVGVTPWS KTSTGAVEH+ PIARVTNLSQTLD LK + FWIFG 



TDMNGTPS WNT GKLALVIGNEGKGIS NIKKQVDEMITIPMNGHVQSLNASVAAAIL 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 167 

A DNA sequence (GBSx0173) was identified in S.agalactiae <SEQ ID 557> which encodes the amino acid 
sequence <SEQ ID 558>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0. 2187 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 





3 


Sbjct: 


10 




63 


Sb j ct : 


69 




123 


Sbjct: 


129 




183 


Sb j ct : 


189 




243 


Sbjct: 


249 



The protein has homology with the following sequences in the GENPEPT database: 



WO 02/34771 



-245- 



PCT/GB01/04789 



>GP:CAB11873 GB:Z99104 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 67/147 (45%), Positives = 94/147 (63%), Gaps = 2/147 (1%) 

Query: 6 ILLVDGYmiAFWKDTRQLFKSI^EEaP^vLLRKIJ^MFEHIDIICVFDAQYVPGVR 65 

ILLVDGYNMI W + L K+N EEAR+VL++K+ Y + +1 VFDA V G+ 
Sbjct: 3 ILLVDGYNMIGAWPQLKDL-KANSFEEARDVLIQKMAEYQSYTGNRVIWFDAHLVKGLE 51 

Query: 66 QRYDQYKISVIFTEEDETADSYIERAAAELNQSVIJ^VSVATSDl^QWTIFSCGALRVS 125 

++ +++ VIFT+E+ETAD IE+ A Hf ++ + VATSD EQW IF QGALR S 
Sbjct: 62 KKQTNHRVEVIFTKENETADERIEKLAQAUI-N1ATQIHVATSDYTEQWAIFGQGALRKS 120 

Query: 126 ARELEQRVATVKSDLDKMSSQIDLSTP 152 

AREL + V T++ ++4 +1 P 
Sbjct: 121 ARELLREVETIERRIERRVRKITSEKP 147 

A related DNA sequence was identified in S.pyogenes <SEQ ID 559> which encodes the amino acid 
sequence <SEQ ID 560>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2465 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 130/167 (77%) , Positives = 149/167 (88%) , Gaps = 1/167 (0%) 

Query: 3 KHSILLVDGYNMIAFWKDTRQLFKSNRLEEAREVLLRKLNHYAHFEHIDIICVFDAQYVP 62 

K ILLVDGYNMIAFW+ TRQLFK+N+L++AR LL KLNHYAHFE+I+IICVFDAQYVP 
Sbjct: 2 KKRILLVDGYNMIAFWQSTRQLFKTNQLDQARWTLLTKLNHYAHFENINIICTFDAQYVP 61 

Query: 63 GWQRYDQYKISVIFTEEDETADSYIERAAftELNQSvIiNLVSVATSDLNEQWTIFSCGAL 122 

G+RQRYDQY ISV+FTEEDETADSYIER AAELN + +++V VATSDLNEQWTI FSQGAL 
Sbjct: 62 GLRQRYDQYYISWFTEEDETADSYIERMAAEm-TAIH^WEVATSDI J NEQWTIFSQGAL 120 

Query: 123 RVSARELEQRVATVKSDLDKMSSQIDLSTPKLRPKNDEQLGKLKDFL 169 

RV+ARELEQRV TVK+DLDKMS IDL TPKLRP++ QL +LKDF+ 
Sbjct: 121 RVTARELEQRVHTVKADLDKMSRDIDLKTPKLRPFDQGQLIQLKDFM 167 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 168 

A DNA sequence (GBSx0174) was identified in S.agalactiae <SEQ ID 561> which encodes the amino acid 
sequence <SEQ ID 562>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4889 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MTFKILTDSTSDLDEKWAQEHNVDIIGLTIELDGKTYETVGDEKITSDFLLERMQEGAKP 60 
MT ++ DS +DL ++E + IL + L K+E I+D + EMQG P 
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Sbjct: 


1 


Query: 


61 


Sbjct: 


59 


Query: 


121 


Sbjct: 


119 


Query: 


181 


Sbjct: 


179 


Query: 


238 


Sbjct: 


238 
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MTVHLIADSATDLPRSYFEEKGIGFIPLRVSLGDKEFEDA- -VTIHADQIFEAMQNGETP 58 

TTSQINVGQFEEVFSTYAENDHALLYIALSSHLSGTYQSATIAREMVLDKYPDAQIEITO 12 ( 

TSQ + + VF YAE LY+A SS LSGTYQ+A + V +++PD + ++D 

KTSQASPQTIKNVFLQYAETGDPALYIAFSSGLSGTYQTAVMIANEVKEEFPDFDLRVID Hi 

TMAASCGEGVIAMLATKERQEGKSLEEVKQKIESLLPKLNTYFLVDDLNHLMRSGRLSKG 18( 
+ AS G G+ A G +++E++ +++ +L F VDDL +L R GR+SK 

SKCASLGYGIAVRHAADLCINGNTIQEIETSVKNFCSQLEHIFTVDDLTYLARGGRISKT Hi 

AAIIGSVAKIKPBLKLDSEGKLVPFAKTRGRKKGIK- - -EIVTQATKTLSYSTLIIAYSG 23' 
+A +G + IKPLL+++ +GKLVP K RG+KK K E++ + S T+ I+Y+ 



+ +E+I+ P+ I +H G G IA+F L 



A related DNA sequence was identified in S.pyogenes <SEQ ID 563> which encodes the amino acid 
sequence <SEQ ID 564>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3247 (Affirmative) c suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities - 167/286 (58%) , Positives = 227/286 (78%) 

Query: 1 MTFKILTDSTSDLDEKWAQEHNVDIIGLTIELDGKTYETVGDEKITSDFLLERMQEGAKP 60 
MTF I+TDST+DL+4 WA++H++ +IGLTI DG+ YETVG +I+SD+LL4+M+ G+ P 
' Sbjct: 1 MTFTIMTDSTADLNQTWAEDHDIVLIGLTILCDGEVYETVGPNRISSDYLLKKMKAGSHP 60 

Query: 61 TTSQINVGQFEEVFSTYAENDHALLYLALSSHLSGTYQSATIAREMVLDKYPDAQIEIVD 120 

TSQINVG+FE+VF +A N+ ALLYLA SS LSGTYQSA +AR++V + YPDA IEIVD 
Sbjct: 61 QTSQINVGEFEKVFREHARNNKALLYLAFSSVLSGTYQSALMARDIiVREDYPDAVIEIVD 120 

Query: 121 TMAASCGEGVIAMLATKERQEGKSLEEVKQKIESLLPKL^ 180 

T+AA+ GEG L +LA + R GK+L E K +E+++P+L TYFLVDDL HLMR GRLSKG 
Sbjct: 121 TIAASlGGEGYLTILAAEARDSGKNLLETKDIVEAVIPRLRTYFLVDDLFHLMRGGRLSKG 180 

Query: 181 AAIIGSVAKIKPLLKLDSEGKLVPFAKTRGRKKGIKEIVTQATKTLSYSTLIIAYSGEKD 240 

+A +GS+A IKPIiL +D EGKLVP AK RGR+K IKE+V Q K ++ ST+I++Y+ ++ 
Sbjct: 181 SAFLGSLASIKPLLWIDEEGKLVPIAKIRGRQKAIKEMVAQVEKDIADSTVIVSYTSDQG 240 

Query: 241 SAQVMKEQLLADERIEEVIIRPLGPVISAHVGSGALALFSLGEENR 286 

SA+ ++E+LLA E I +V++ PLGPVXSAHVG LA+F +G+ +R 
Sbjct: 241 SAEKLREELLAHENISDVLMMPLGPVISAHVGPNTLAVFVIGQNSR 286 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 169 

A DNA sequence (GBSx0175) was identified in S.agalactiae <SEQ ID 565> which encodes the amino acid 
sequence <SEQ ID 566>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.76 Transmembrane 43 - 59 ( 40 - 62) 
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Final Results 

bacterial membrane Certainty=0 .4503 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 170 

A DNA sequence (GBSx0176) was identified in S.agalactiae <SEQ ID 567> which encodes the amino acid 
sequence <SEQ ID 568>. This protein is predicted to be ribosomal protein LI 3 (rplM). Analysis of this 
protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3426 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9507> which encodes amino acid sequence <SEQ ID 9508> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 36 KTTFMAKPGQVERKVfYV^/DAaDVPLGRLSAW&SVLRGKNKPTFTPHTDTGDFVIVINAE 95 

+TT+MAKP +VERKWYWDA LGRL++ VAS+LRGK+KPT+TPH DTGD VI+INAE 
Sbjct: 2 RTTYMAKPNETVERKWYVVDAEGQTLGRIASEVASILRGKHKPTYTPHVDTGDHVI I INAE 61 

Query: 96 KVKLTGKKASDKIYYTHSMYPGGLKQISAGELRSKNAVRLIEKSVKGMLPHNTLGRAQGM 155 

K+ LTG K DKIYY HS +PGGLK+ A ++R+ +++E ++KGMLP NTLGR QGM 
Sbjct: 62 KIHLTGNKLQDKIYYRHSGHPGGLKETRAADMRANKPEKMLEIAIKGMLPKNTLGRKQGM 121 

Query: 156 KLKVFVGGEHTHAAQQPEVLDISG 179 

KL V+ G EH H AQ+PEV +4- G 
Sbjct: 122 KLHVYAGSEHKHQAQKPEVYELRG 145 

A related DNA sequence was identified in S.pyogenes <SEQ ID 569> which encodes the amino acid 
sequence <SEQ ID 570>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .4249 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 167/184 (90%) , Positives = 171/184 (92%) , Gaps = 4/184 (2%) 

Query: 1 MFTPFVRPRNLSNTLVDRNIHT--CKQ-KRIRIGEIM<fKTTFMAKPGQVERKWYVVDAAD 57 
+FTPF RPRNL NT D H CKQ RIRIGEIMNKTTFMAKPGQVERKWYWDAAD 
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Sbjct: 1 LFTPFERPRNLPNTF-DGTEHPSPCKQILRIRIGEIMl^KTTFMAKPGQVERKWYWDAAD 59 

Query: 58 VPLGPiSAWASVLRGKNKPTFTPHTDTGDFVIVIHMKVKLTGKKASDKIYYTHSMyPG 117 

VPLGRL^WASVLRGKNKPTFTPHTDTGDWIVIH&EKVKLTGKKA+DK+YyTHSMYPG 
Sbjct: 60 VPLGRLSAWASVLRGKNKPTFTPHTDTGDFVIVIHAEKVKLTGKKRTDKVYYTHSMYPG 119 

Query: 118 GLKQISAGELRSKNAWL1EKSVKGMLPHNTLGRAQGMKLKVFVGGEHTHAAQQPEVLDI 177 

GLK I+AGELRSKNAVRLIEKSVKGMLPHNTLGRAQGMKLK7FVGGEHTHAAQQPEVLDI 
Sbjct: 120 GLKSITAGELRSKNAVRLIEKSVKGMLPHNTLGRAQGMKLKVFVGGEHTHAAQQPEVLDI 179 

Query: 178 SGLI 181 

Sbjct: 180 SGLI 183 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 171 

A DNA sequence (GBSx0177) was identified in S.agalactiae <SEQ ID 571> which encodes the amino acid 
sequence <SEQ ID 572>. This protein is predicted to be 30S ribosomal protein S9 (rpsl). Analysis of this 
protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1761 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB11926 GB:Z99104 ribosomal protein S9 [Bacillus subtilis] 
Identities = 88/130 (67%) , Positives = 105/130 (80%) 

^ MAQ QY GTGRRK+'+VARVRLVPG G+I +N +++ E+IP A L I QP +T T G+ 

Sbjct: 1 MAQVQYYGTGRRKSSVARVRLVPGEGRIWNNREISEHIPSAALIEDIKQPLTLTETAGT 60 

Query: 61 YDVFVNWGGGYAGQSGAIRHGISRALLEVDPDFRDSLKRAGLLTRDARMVERKKPGLKK 120 

YDV VNV GGG +GQ+GAIRHGI+RALLE DP++R +LKRAGLLTRDARM ERKK GLK 
Sbjct: 61 YDVLVNVHGGGLSGQAGAIRHGIARALLEADPEYRTTLKRAGLLTRDARMKERKKYGLKG 120 

Query: 121 ARKASQFSKR 130 

AR+A QFSKR 
Sbjct: 121 ARRAPQFSKR 130 

A related DNA sequence was identified in S.pyogenes <SEQ ID 573> which encodes the amino acid 
sequence <SEQ ID 574>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1865 (Affirmative) < suco 

bacterial membrane Certainty=o. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 124/130 (95%) , Positives = 129/130 (98%) 
Query: 1 MAQAQYAGTGRRKNAVARWLVPGTGKITINKKDVEEYIPHADLRLVINQPFAVTSTQGS 60 
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Query: 61 YDVFVJWVGGGYAGQS6AIRHGISRALL3VDPDFRDSLKRAC3LLTRDAEMVERKKPC3LKK 120 

YDVFVNWGGGY GQSGAIRHGI +RALL+VDPDFRDSLKRAGLLTRDARMVERKKPGLKK 
Sbjct: 61 YDVFVNWGGGYGGQSGAIRHGIARALLQVDPDFRDSLKRAGLLTRDARMVERKICPGLKK 120 

Query: 121 2 
2 

Sbjct: 121 ARKASQFSKR 130 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 172 

A DNA sequence (GBSx0178) was identified in S.agalactiae <SEQ ID 575> which encodes the amino acid 
sequence <SEQ ID 576>. This protein is predicted to be recombinase (bl345). Analysis of this protein 
sequence reveals the following: 
Possible site: 43 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1939 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG29618 GB:AF217235 integrase-like protein [Staphylococcus 
aureus] 

Identities = 127/386 (32%) , Positives = 205/386 (52%) , Gaps = 18/386 (4%) 

IHKYPSKKAKNGYLYFVKIYMVKD- - -SQRADHIKRGFRTRKEAKDYEARLIYLKASGKL 59 
I KY K Y++ Y+ D ++ +RGF+T +EAK EA+L + 

IKKYKKKDGSTAYMFVA- - YLGTDP I TGKQKRTTRRGFKTEREAKIAEAKL - --QTEVSQ 56 



Query: 


3 


Sbjct: 


2 






Sbjct: 


57 




120 


Sbjct: 


117 




177 


Sbj ct: 






236 


Sb j ct : 


235 




296 


Sbj ct : 


294 


Query: 


356 


Sbjct: 


351 



- +IK 1+ YT VF +A+ +K++ NP A 



S R I +D +T S+L+ W++ 



+PL+ ++ N L I K+ K+I HGFRHTH +L* E G+ 



D+ +FA Y+ 



A related DNA sequence was identified in S.pyogenes <SEQ ID 577> which encodes the amino acid 
60 sequence <SEQ ID 578>. Analysis of this protein sequence reveals the following: 
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Final Results ' 

bacterial cytoplasm Certainty=0. 3445 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 00DD (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 109/386 (28%) , Positives = 185/386 (47%) , Gaps = 28/386 (7%) 

IHKYPSKKAKNGYL-YFVKIYMVKDSQRADHIKRGF- -RTRKEA- - KDYEARLIYLKASG 57 
IK K KNG + Y IY+ D +K RTRKE K A+ +L 

IMKITEHKKKNGTIVYRAS I YLGIDQMTGKRVKTS ITGRTRKEVNQKAKHAQFDFLSNGS 6 5 



Q I +K + N ++S KV + + L+ +N +II+P+++ 







Ibjl- 


6 












118 


Sbjct: 






175 


Sbjct: 


183 




229 


Sbjct: 


242 


Query: 


289 


Sbjct: 


296 


Query: 


349 


Sbjct: 


352 



rLQEEPYKHY ALFRLLAYSGLRKGELYALKWADIDFQTETLSV 228 

L+ +K Y L++LL +GLR GE AL+W DID + T+++ 

[LETSQHKRYNLYFDAVXiYQLIiLSTGLRIGEACALEWGDIDLENGTlAI 241 



I I +D +T+ L+ +• Q + QL + + 

WISTOKKTLRSLK LYQMRQRQLFNEVGARVSEV 295 



^ G+++ + H FRHTHA+L++ G+ 



RLGHA+ MTLDTY H 4 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 173 

A DNA sequence (GBSx0179) was identified in S.agalactiae <SEQ ID 579> which encodes the amino acid 
sequence <SEQ ID 580>. Analysis of this protein sequence reveals the following: 

rminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0. 24 7 7 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF63067 GB:AF158600 putative DNA binding protein 

[Streptococcus thermophilus bacteriophage Sfill] 
Identities = 32/70 (45%) , Positives = 46/70 (65%) , Gaps = 3/70 (4%) 

Query: 3 NRLKELRKDKGLTQADIAKVINTNQSQYGKYEKGKTSLSIENSKILADFFGVSIPYLLGL 62 

NRL LR+ + +T+ +LA+ I ++ K E+G + +S +K LADFFGVS+ YLLGL 
Sbjct: 2 NRLYLLRESRKITRVEIAEKIGVSKLTVIjKLEKGTSKISRREAKKIjADFFGVSVGYLLGL 61 
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Query: 63 D NNSKIA 69 

D N+S IA 
Sbjct: 62 DTTENDSLIA 71 

5 A related DNA sequence was identified in S.pyogenes <SEQ ID 58 1> which encodes the amino acid 
sequence <SEQ ID 582>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

»> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 0680 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

1 5 An alignment of the GAS and GBS proteins is shown below: 

Identities = 21/61 (34%) , Positives = 34/61 (55%) 

MY R++ LR+D TQ +A +++ + + Y K E G+ +L + + VSI YLL 

20 Sbjct: 1 MYPRIRNLREDNDFTQKFVANLLSFSHANYAKIERGEVALMADVLVQFYKLYNVSIDYIjL 60 

Query: 61 G 61 
G 

Sbjct: 61 G 61 

25 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 174 

A DNA sequence (GBSx0180) was identified in S.agalactiae <SEQ ID 583> which encodes the amino acid 
30 sequence <SEQ ID 584>. Analysis of this protein sequence reveals the following: 
Possible site: 29 

»> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0. 5278 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

40 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 175 

A DNA sequence (GBSx0181) was identified in S.agalactiae <SEQ ID 585> which encodes the amino acid 
45 sequence <SEQ ID 586>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

»> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 3762 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

5 Example 176 

A DNA sequence (GBSx0182) was identified in S.agalactiae <SEQ ID 587> which encodes the amino acid 
sequence <SEQ ID 588>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>» Seems to have no N-terminal signal sequence 
10 INTEGRAL Likelihood = -9.66 Transmembrane 40 - 56 ( 33 - 65) 

INTEGRAL Likelihood = -5.79 Transmembrane 62 - 78 { 59 - 81) 

Final Results 

bacterial membrane Certainty=0. 4864 (Affirmative) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8505> and protein <SEQ ID 8506> were also identified. Analysis of this 
20 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: -16.95 
GvH: Signal Score (-7.5): -2.95 
Possible site: 57 
25 >>> Seems to have no N-terminal signal sequence 

ALOM program count: 2 value: -9.66 threshold: 0.0 

INTEGRAL Likelihood = -9.66 Transmembrane 33 - 49 ( 26 - 58) 
INTEGRAL Likelihood = -5.79 Transmembrane 55 - 71 ( 52 - 74) 
PERIPHERAL Likelihood = 10.87 14 
30 modified ALOM score : 2.43 



* Reasoning Step: 3 

— Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



-- Certainty=0. 4854 (Affirmative) ■ 
-- Certainty=0. 0000 (Not Clear) < i 
-- Certainty=0. 0000 (Not Clear) < i 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



40 Example 177 

A DNA sequence (GBSx0183) was identified in S.agalactiae <SEQ ID 589> which encodes the amino acid 
sequence <SEQ ID 590>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>» Seems to have no N-terminal signal sequence 

45 

Final Results 

bacterial cytoplasm Certainty=0. 3276 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 

The protein has no significant homology with any sequences in the GENPEPT database. 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 178 

5 A DNA sequence (GBSx0184) was identified in S.agalactiae <SEQ ID 591> which encodes the amino acid 
sequence <SEQ ID 592>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

»> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm --- Certainty=0. 3482 (Affirmative) < suco 

bacterial membrane Certainty=Q . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 A related GBS nucleic acid sequence <SEQ ID 9509> which encodes amino acid sequence <SEQ ID 9510> 
was also identified. 



The protein has homology with the following sequences in the GENPEPT database: 





>GP:CA 




31 GB:X07371 RepM protein (AA 1 - 314) [Staphylococcus 










aureus] 




20 


Identities 


3 = 89/283 (31%), Positives = 145/283 (50%), Gaps = 26/283 (9%) 






67 


KVSLDNITMTAYIKSKKYLAMKQLIETHLAIWQTAMTDMFRATTGDGIHVVLHMNYDKQ 


126 








K+S D +T+ + + + I + + F+A + +++ YDK 






Sbjct: 


42 


KLSFDAMTIVGNLNKNSAKKLSDFMSLDPQIRLWDILQTKFKAKA- - - LQEKVYIE YDKV 




25 














127 


KGQDRKARPFRLEFNPNKLRLVDSEII DTIIPFLEDISISRADLAFDLFEVDCSEF- 


182 








K R R+EFNPNKL E++ II ++ED +R DLAFD FE D S++ 






Sbjct: 


99 


KADTWDRRNMRVEFNPNKL - - THDEMLWLKHNI IDYMEDDGFTRLDLAFD - FEDDLSDYY 


155 


30 




183 


-VLEKKGRPTATKEFRSSTGTLETKYLGAPRSEKQVRLYNKKECEQLQNGTDKDKDFASQF 


241 








+ EK + T F +TG ETKY G+ S + +R+YNKKKE+ +N D D +++ 






Sbjct: 


156 


ALSEKALKRTV- - - FFGTTGKAETKYFGSRDSNRFIRIYNKKKERKENA- - -DVDVSAE- 


208 






242 


KHWWRLEFQLRSRSIDEIFEVI -DTI IFKP- -FNLKGLSIETQIYLTALIHDKNIWKKLH 


298 


35 






H WR+E +L+ +D D I KP L+ L + +YL L+H+++ W +LH 






Sbjct: 


209 


- HLWRVEIELKRDMVDYWOTTCFNDLH ILKPAWATLE SLKEQAMVYL - - LLHEESKWGELH 


265 




Query: 


299 


RNTRARYKKILETHQTSDTDYLGLLKDLIjKHERPRLENQLAYY 341 










. RN+R +YK+I++ + S D L+K L L+ Q+ ++ 




40 


Sbjct: 


266 


RNSRRKYKQIIQ--EISSIDLTDLMKSTLTDNEENLQKQINFW 306 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



45 Example 179 

A DNA sequence (GBSx0185) was identified in S.agalactiae <SEQ ID 593> which encodes the amino acid 
sequence <SEQ ID 594>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have no N-terminal signal sequence 
50 INTEGRAL Likelihood =-15.55 Transmembrane 137 - 153 ( 133 - 157) 

Final Results 

bacterial membrane Certainty=0. 7220 (Affirmative) < suco 
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bacterial outside — Certainty=0.0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 951 1> which encodes amino acid sequence <SEQ ID 9512> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8507> and protein <SEQ ID 8508> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 2 
McG: Discrim Score: -16.84 
GvH: Signal Score (-7.5) : -5.3 

Possible site: 32 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -15.55 threshold: 0.0 

INTEGRAL Likelihood =-15.55 Transmembrane 137 - 153 ( 133 - 157) 
PERIPHERAL Likelihood = 10.93 60 
modified ALOM score: 3.61 

*** Reasoning Step: 3 



Final Results 

bacterial membrane — Certai:ity=0 . 7220 (Affirmative) < suco 

bacterial outside — - Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF01844(292 - 702 of 1074) 

EGADl 124517 1 132830(149 - 295 of 435) apolipoprotein A- IV Mus 
GP|l91889|gb|AAA37216.l| (M642S0 apolipoprotein A- IV {Mus musculus castaneus} 
%Match =4.6 

%Identity =30.0 %Similarity =54.6 

Matches = 39 Mismatches = 53 Conservative Sub.s = 32 

201 231 261 291 321 351 381 411 

NSSOTRY*LFRFAERLVEA*KTKTRKSARLLWG*DRQK*LSTLLLra^ 

LRDRM^PHANKVTQTFGENMQKLQEHLKPYAVDLQDQIOT 



P--KYKERLKGHLY-RDNQKVF^LDLIGIKILEDYHFENDVIELEKTLGD IQEEFEQEKKGMQY 

KFI^LIIghLpRANELKATID OMLEDLRRSLAPLTVGVQEKLNHQMEGLAFQMKKNAEELQTK 



---RIDRl^KLTPLLEDNQNLVQKNYE-LLNYreSLERQKL^^ 

vsakidqlqkJaplve^ 

270 280 290 300 310 320 330 

SEQ LD 8508 (GBS405) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 171 (lane 4; MW 46kDa - 2 bands) and in Figure 177 (lane 7; MW 46kDa). It 
was also expressed in E.coli as a His-fusion product SDS-PAGE analysis of total cell extract is shown in 
Figure 76 (lane 5; MW 21kDa). 

GBS405-GST was purified as shown in Figure 218, lane 8. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 180 

A DNA sequence (GBSx0186) was identified in S.agalactiae <SEQ ID 595> which encodes the amino acid 
5 sequence <SEQ ID 596>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm --- Certainty=0. 3406 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

15 >GP:CAA33713 GB:X15669 pre protein (AA 1-494) [Streptococcus 

agalactiae] 

Identities = 171/402 (42%), Positives = 250/402 (61%), Gaps = 46/402 (11%) 

Query: 1 MS YWAR^KYKSGQLTAI YNHNERI FK1TOSNKEID K 60 

20 MSY+VARM K K+G L 4- HNER+F+ HSNK+I+ +SHLNYELT+RD++ +Y KQIK 

Sbjct: 1 MSYMVARMQKMKAGNLGGAFKHNERVFETHSNKDINPSRSHLJJYELTDRDRSVSYEKQIK 60 

Query: 61 EHINFJ^LSTRGWIODAILCNEWIITSDICTFFDSLDEKQTREFFETAKDYFAEKYGDANI 120 
+++NEN++S R +RKDA+LC+EWI ITSDK FF+ LDE+QTR FFETAK+YFAE YG++NI 
25 Sbjct: 61 DYVNENKVSNRAIRKDAVLCDEWIITSDKDFFEKLDEEQTRTFFETAKNYFAENYGESNI 120 

Query: 121 AYARWLDESTPHMHLGIVPMKNGKLSSKALFGNKEKLVAIQDELPKYLNEHGFNLQRGE 180 

AYA VHLDESTPHMH+G+VP +NGKLSSKA+F ++E+L IQ++LP+Y+++HGF L+RG+ 
Sbjct: 121 AYASVHLDESTPHMHMGWPFENGKLSSKAMF-DREELKHIQEDLPRYMSDHGFEI.ERGK 179 

30 

Query: 181 IGSKKKHLETAEFKEKQRLLDNADRKLADKHEELKALDDKI3NV-NDTIA 229 

+ S+ KH AEFK ++ +L +K+ +D++ + NDT A 

Sbjct: 180 LNSEAKHKTVAEFKPAMADME-LKEELLEKYHAPPFVDERTGEIJslNDTEAFWHEKEFADM 238 

35 Query: 230 -DKESRLKEL- - -EAKEWDAVGDLKQYELEKQSLAESIEDIKDIELLQLDRIQKEDLVKQ 285 
+ +S ++E E +W KQY+ E + L S + ++D D E+L+ + 

Sbjct: 239 FEVQSP I RETTNQEKMDWLR KQYQEELKKLESSKKPLED DLSHLEELLDK 288 

Query: 286 SFDGKLKTOKETYNRLFQTASKHASSNAELKRDLVKAQSQNNHLSRELIjNHRKTAEKNIK 345 
40 +K+D E AS+ AS +L KA+ N L NH K+ E 1 + 

Sbjct: 289 KTKEYIKIDSE ASERAS ELSKAEGYINTLE NHSKSLEAKIE 329 

Query: 346 LSQENRKLKDKVKMLDEQVKII1NKSI1SVWKEKAKEFMPKQVY 387 
+ + +K K + K LN+S + K F+ K+ Y 

45 Sbjct: 330 CLESDNLQLEKQKATKLEAKALNESELRELKPKKNFLGKEHY 3 71 

A related DNA sequence was identified in S.pyogenes <SEQ ID 597> which encodes the amino acid 
sequence <SEQ ID 598>. Analysis of this protein sequence reveals the following: 

LPXTG motif: 2025-2030 

50 

Possible site: 52 



» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.08 Transmembrane 2034 -2050 (2030 -2053) 
INTEGRAL Likelihood = -6.05 Transmembrane 21 - 37 ( 20 - 39) 

Final Results 

bacterial membrane Certainty=0. 5 03 4 (Affirmative) < succ> 

bacterial outside Certainty=0 . 0 0 0 0 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0 0 0 0 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:AAD03320 GB:AF067776 extracellular matrix binding protein 
[Abiotrophia defectiva] 
Identities = 362/1396 (25%) , Positives = 591/1396 (41%) , Gaps = B7/139S (6%) 

Query: 636 KAEVKLKEAHEATKQAIEKDPWLSPEQKKAQKEKBJ^ 695 

+A+ + A +A AI+ + L+ E+K A+K K+A + L + A KT 
Sbjct: 636 EAKNAVNNAAKAKOTAIDM^LTAEEKAAEKAKOTMKNATLAGIDQA KTTAA 689 

Query: 696 AFVDKEKNPDSIPNQHKAGTADQARKQALDSLDKEVQKELESIDNDNTLTTDEKAAAKKK 755 

+ K I + A A AL+ + ++ I LT +EK A + 

Sbjct: 690 RNAAQNKGTTDINAVNPVPVAKPAANAALE QAAVNKINEISQRPDLTREEKQAFMDQ 746 

Query: 756 VNDAYDVAKQTAMEANSYEDLTTI KDEFLS NLPHKQGTPLKDQQSDAIAELEKKQQE 812 

V ADA A + + +T+ +D+ L+ NLP TP + +A+ + + 
Sbjct: 747 WTARDAAMAKVASAANNQAVTSARDQGLNAVNNLP TPAA-KYPEALGHVRQAADA 801 

Query: 813 IEKAIEGDKTLPRDEKEKQIADSKERLKSDTQKVKDAKNADAIKKAFEEGKVNIPQAHIP 872 

+AI + L +E+ + + + + KA +G I 
Sbjct: 802 KRQAIRDNANLTAEEQADALRQVDAAQTAAEAAINQNHTNATLAKADSDGVKAI 855 

Query: 873 GDLN KDKEKLLAELKQKADDTEKAIDVDKTLTEDEKKEQKVKTKAELEKAKTDVKNT 929 

D+N + K L+Q A +AI+ + LT++EK + + L AKT V+ 

Sbjct: 856 ITOINPQPRSKPAANQALEQVAAAKRQAINNNNQLTDEEKAQAIQQVDQALANAKTQVQAA 915 

Query: 930 QTREELDKKA7PELKKAIEDTHVKGNLEGVKNKAIEDLKKAHTETVAKINGDDTLDKATKE 989 

+++ AI + + +G K +AI ++ A ++ G + h + 

Sbjct: 916 NDNNGVNQAKTAGTTAINNINPQGTQ- - -KAQAIAAIEAAEQAKRLELQGRNDLTTEERN 972 

Query: 990 AQVKEADKALAAGKDAITKADDADIWSTAVTEHTPKIKAAHKTGDLKKAQvDANTALDKA 1049 

+ + A KDA+ +A + V+ A +1+ + T +K DA A+D+A 

Sbjct: 973 NALADLTAKAQAAKDAWQARNNTGVAGAKDNGVAQIQGINPTAWKP- - -DARKAIDQA 1029 

Query: 1050 AEKERGEINKDATLTTEDKAKQLKEvETALTKAKDNVKAAKTADAINDARDKGVATIDAV 1109 

A + E + LT E+KA +K+V+ A AK + A + +N+A ++G A I A+ 
Sbjct: 1030 ARDKEAEFQANTiO^TDEEKAARIKKVQDAARDAKAAIDRAGSNGDvWNAWQGKAAIQAI 1089 

Query: 1110 HKAGQDLGARKSGQVAKLEEAAKATKDKISADPTLTSKEKEEQSKAVDAELKKAIEAVNA 1169 

+ K A ++ AA A K I+A+ LT +EK K V+ E KA AV+A 
Sbjct: 1090 KALDDSQPSAKDTAKAAIQNAADAKKARITANNALTQEEKAAAIKQVBDEARKAQAAmA. 1149 

Query: 1170 ADTADKvTJDALGEGVTDIKNQHKSGDSIDARREAHGKELDRVAQETKGAIEKDPTLTTEE 1229 

+ + VD A +G+ I + ++ + 4D+ A + K I D TLT EE 

Sbjct: 1150 SRSKADVDRAKDQGLQKISDV PAVQPPKLNAIAAvDQAATDKKAVINHDTTLTQEE 1205 

Query: 1230 KAKQVKDVDAAKERGMAKLNEAKDADALDKAYGEGVTDIKNQHKSGDPVDARRGLHNKSI 1289 

K ++ VD + +N+A + +G IN ++ A + ++ 

Sbjct: 1206 KEAAIRKVDEEAAKARQAIHDATSNADVAAKQAQGTQAINNVPQT PAAKNAAKAAV 1261 

Query: 1290 DEVAQATKI)AITADTTLTEAEKETQRG^1/DKEATKAKEELAKAKDADALDKAYGDGVTSI 1349 

++ A A K AI D LT EK+ VD+E KA++ + A + +G +1 

Sbjct: 1262 EQAADAKKQAIEMDPNLTRQEKDAAIAKVDQETNKARQAIDAATTNADVTAKQNEGTQAI 1321 

Query: 1350 KMQHKSGKGLDWKDEHKKALEAVAKRVTAEIEADPTLTPEVREQQKAEVQKELELATDK 1409 

++ K K + K A+ A+ + IE DP LT E ++ KA+V E A + 

Sbjct: 1322 NAVPQTPKA KTDAKNAvTQAAEDKKSAIENDPNLTREEKDAAKAKVDAEATKAKNA 1377 

Query: 1410 IAEAKDADEADKAYGDGVTAIENAHVIGKGIEARKDLAKKDIAEAAAKTK^ 1469 

I A D+ +G AI 4- + + +A+ D AK + +AA + K I D LT 
Sbjct: 1378 IDAATSNDDETAKQNEGTQAI NAVPQTPKAKTD-AKNAVTQAADRKKDAIENDPNLT 1433 

Query: 1470 DDQRKEQLLGVDTEYAKGIEKIDAAKDAAGVDKAYSDGVRDILAQYKEGQISnLNDRRNAAK 1529 

+++ VD E K + IDAA A V ++G +1 + + AK 

Sbjct: 1434 REEKVAAKAKVDAEAKKAKDAIDAATSNADVTAKQNEGTKAI NDVPQTPTAKTDAK 1489 

Query: 1530 EFLLKEADKVTKLINDDPTLTHDQKVDQINKVEQMCLDAIKSVDDAQTADAINDALGKGI 1589 
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Query: 1590 ENINNQYQHGDGVDVRKATAKGDLEKEflAKVKRLIA 1649 

+ IN+ Q K AK + + A K I KDP LT+ +KD A VDA A 

Sbjct: 1550 KAINDVPQ TPTAKTDAKNAWQAADAKKDAIEKDPNLTRKEKDAAKAKVDAEAKKA 1605 

Query: 1650 IAAVDKATTTEGINQELGKGITAINKAYRPGEGVKARKEAAKADLEKEAAK^KALITNDP 1709 

A+D AT 4 + + G AIN + KAK++A KIND 

Sbjct: 1606 KDAIDAATSNADVTAQKDAGKNAINAVPQ TPTAKTDAKNAVTQAADAKKDAIENDA 1661 

Query: 1710 TLTKADK-AKQTEAVAKALKAAIAAVDKATTAEGINQELGKGITA1NKAYRPGEGVKARK 1768 

LT+ +K A + + A+A KA A+D AT+ + + +G AIN + K 
Sbjct: 1662 NLTREEKDAAKAICVDAEATKAK-NAIDAATSNADVTAKQNEGTKAINDVPQ TPTAK 1716 

Query: 1769 EAAKADLEREAAXVREAIANDPTLTKADK-AKQTEAVAKALKAAIAAVDKATTAEGINQE 1827 

AK +++ A + AI NDP LT+ +K A + + A+A KA A+D AT+ + + 
Sbjct: 1717 TDAKNAVDQAATDKKSAIENDPALTREEKDAAKAKVDAEATKAK-NAIDAATSNADVTAQ 1775 

Query: 1828 LGKGITAINKAyRPGEGVEAHKEAAKANLEKVAKETKALISGDRYLSETEKAVQKQAVEQ 1887 

G AIN + K AK +++ A + KA I D L+ EK K V+ 

Sbjct: 1776 KDAGKNAINAVPQ TPTAKTDAKNAVDQAATDKKAAIENDPALTREEKDAAKAKVDA 1831 

Query: 1888 AIiAI<ALGQVEAAKTVEAVKLAENLGTVAIRSAWAGLAKDTDQATAALNEAKQAAIEALK 1947 

KA ++AA +V++G KDA AKAA+ 

Sbjct: 1832 EAKKAKDAIDAATSNADVTAQKDAG KDAINAVPQTPTAKTDAKNAVD 1878 

Query: 1948 QAAAETLAKITTDAKLTEAQKAEQSENVSLAIiKTAIATVRSAQSIASVKEAKDKGITAIR 2007 

QAA + + I D LT +K V K A + +A S A V + +G AI 

Sbjct: 1879 QAATDKKSAIENDPALTREEKDAVKAKVDAEAKKAKDAIDAATSNADVTAKQTEGTQAIN 1938 

Query: 2008 AAYVPNKAVAKSSSAN 2023 

A VP AK+ + N 
Sbjct: 1939 A- -VPQTPTAKTDAKN 1952 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 77/396 (19%) , Positives = 157/396 '(39%) , Gaps = 48/396 (12%) 

Query: 42 LNYELTNRDQAQNYHKQIKEHINENRLSTRGVRKDAILCNEWIITSDKTFFDSLDEKQTR 101 

L++E+ + ++QN K+I + + D E +1 K +++ EK T 

Sbjct: 338 LDFEILH- PRSQNVSKKISKQVEAKPF DPASYKEKVIAKLKPVYEATSEKITN 389 

Query: 102 EFF--ETAKDYFAEKYGDANIAYARVHLDESTPHMHLGIVPMKNGKLSSKALFG--NKEK 157 

+ + E AKD +K + 1+ G V + +A+ NK 

Sbjct: 390 DAWLDENAKDLQKQKLEEQYIS GKVAISEAGTKQEAIDAAYNKYS 434 

Query: 158 LVAIQDELPKYLNEHGFNLQRGEIGSKKKHLETAEFKEKQRLLDN ADRKLADKHEEIi 214 

D LP + N + + ++ ++T + K D K K E L 

Sbjct: 435 SQTDPDSLPSQYKQG--NKENEQEKGRQDLIQTRDLTLKAIQEDKWLTEQEKTIQKEEAL 492 

Query: 215 KALDDKISNVNDTIADKESRLKELEAKEWDAVGDLKQYE LEKQSLAESIE 264 

KA + I +VN T++ ++ + + + K + + K+Y EK+ A E 

Sbjct: 493 KAFETGIESWQWSLEQLKQRLIWKASEKDSEKKEYPESIPNQHIPGKEKEVKAAKQE 552 

Query: 265 DIKDIELLQLDRIQKEDLVKQSFDGKLKMDKETYNRLFQTASKHASSNAELKRDLVKAQS 324 

++K + L++I +++ + E + QAKA+ +L+ DL S 

Sbjct: 553 EI.KKLHDTTLEKINQDKWLTPDQQA3QLKQAEVTFKKGQEAIKSAQTLTQLETDLADYVS 612 

Query: 325 QNNHLSRELLNHRKTAEKNIKLSQENRK1KDKVKMLDEQVK IIiNKSLSVWKEKAKE 380 

+N + + K+ K+ +++ KLK+ + + ++ + + KEKAK 

Sbjct: 613 ENEGKGNSIPDKYKSGNKDDLVNKAEVKLKEMEATKQAIEKDPWLSPEQKKAQKEKAKA 672 

Query: 381 FMPKQVYRETLSIIOTLNPIGLAKTAIRQVKKMVDS 416 

+ + + + L ++L + + + A +K DS 
Sbjct: 673 RLDEGL- - KALKAADSLEILKVTEEAFVDKEKNPDS 706 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 181 

A DNA sequence (GBSx0187) was identified in S.agalactiae <SEQ ID 599> which encodes the amino acid 
sequence <SEQ ID 600>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 2544 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 182 

A DNA sequence (GBSx0188) was identified in S.agalactiae <SEQ ID 601> which encodes the amino acid 
sequence <SEQ ID 602>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 2045 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 603> which encodes the amino acid 
sequence <SEQ ID 604>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2045 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 102/111 (91%) , Positives = 107/111 (95%) 

Query: 1 MDYKKYQlIYAPDVLEKLKEIRDyiSQNYSSTSC-QHKMEQIISDIEKLEVFPEVGFDADE 60 

+DYKKYQIIYAPDVXEKLKEIRDYISQNYSSTSGQ KMEQIISDIEKLEVFPEVGFDADE 
Sbjct: 1 LDYKKYQIIYAPDVLEKLKEIRDYISQNYSSTSGQRKMEQIISDIEKLEVFPEVGFDADE 60 

Query: 61 KYGSKISKYHSTRGYTLSKDYIVLYHIEEEENRVVIDYLLPTRSDYMKuFK 111 

KYGSKI YHST+GYTIiSKDYIVLYHIE EENR+VIDYLLPT+SDY+KLFK 
Sbjct: 61 KYGSKI IHYHSTKGYTLSKDYIVXiYHIEGEENRIVIDYLLPTQSDYIKLFK 111 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 183 

A DNA sequence (GBSx0189) was identified in S.agalactiae <SEQ ID 605> which encodes the amino acid 
sequence <SEQ ID 606>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1621 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 {Not Clear) < suco 

bacterial outside — Certainty=0 . 0 0 0 0 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 607> which encodes the amino acid 
sequence <SEQ ID 608>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1596 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 91/95 (95%) , Positives = 93/95 (97%) 

Query: 1 MVTAEKNRAVTFQANKELVSEAMTVMKK]^^ 60 

M T +KNRAVTE^ANKELVSEAMTVI^KKNLTLSSALRI1jFLQNWVTNEVDLLTEEELEK 
Sbjct: 1 MTTVKKNRAVTEX^KEIiVSEAMrV^ 60 

Query: 61 EKLFKQFQAEINKNIEDVRQGKFYTSEEVRSELGIi 95 

EKLFKQFQAEINKNIEDVRQGKFYTSEEVR+ELGL 
Sbjct: 61 EKLFKQFQAEINKNIEDVRQGKFYTSEEVRAELGL 95 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 184 

A DNA sequence (GBSx0190) was identified in S.agalactiae <SEQ ID 609> which encodes the amino acid 
sequence <SEQ ID 610>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 4568 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9513> which encodes amino acid sequence <SEQ ID 9514> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 
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>GP:CAA46375 GB:X65276 0RFA1 [Clostridium acetobutylicum] 
Identities = 36/91 (39%) , Positives = 51/91 (55%) 

Query: 2 MSQIKLTPEELRISAQKYTTGSQSITDVLTVLTQEQAVIDENWDGTAFDSPEAQENELSP 61 
5 M+QI +TPEEL+ AQ Y +1 ++ +IEWGAF++ Q+N+L 

Sbjct: 1 MAQI SVTPEEMCSQAQVYI QSKEE I DQAI QKVMSMNSTI AEEWKGQAFQAYLEQYNQLHQ 60 

Query: 62 KITQFAQLLEDIHQQLLKVADWEQTDSDIA 92 
+ QF LLE +NQQL K AD V + D+ A 
10 Sbjct: 61 TWQFENLLESVNQQLNKYADTVAERDAQDA 91 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

15 Example 185 

A DNA sequence (GBSx0191) was identified in S.agalactiae <SEQ ID 61 1> which encodes the amino acid 
sequence <SEQ ID 612>. Analysis of this protein sequence reveals the following: 
Possible site: 21 

>>> Seems to have no N-terminal signal sequence 

20 

Final Results 

bacterial cytoplasm Certainty=0 .4523 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

25 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

30 Example 186 

A DNA sequence (GBSx0192) was identified in S.agalactiae <SEQ ID 613> which encodes the amino acid 
sequence <SEQ ID 614>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

»> Seems to have no N-terminal signal sequence 

35 

Final Results 

bacterial cytoplasm Certainty=0. 5339 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 187 

A DNA sequence (GBSx0193) was identified in S.agalactiae <SEQ ID 615> which encodes the amino acid 
sequence <SEQ ID 61 6>. This protein is predicted to be chromosome assembly protein. Analysis of this 
protein sequence reveals the following: 

5 Possible site: 61 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4620 (Affirmative) < suco 

10 bacterial membrane Certainty=Q . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

15 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 188 

A DNA sequence (GBSx0194) was identified in S.agalactiae <SEQ ID 617> which encodes the amino acid 
sequence <SEQ ID 618>. Analysis of this protein sequence reveals the following: 

20 Possible site: 46 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 4 511 (Affirmative) < suco 

25 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

30 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 189 

A DNA sequence (GBSx0195) was identified in S.agalactiae <SEQ ID 619> which encodes the amino acid 
sequence <SEQ ID 620>. Analysis of this protein sequence reveals the following: 

35 Possible site: 20 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5249 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

45 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 190 

A DNA sequence (GBSx0196) was identified in S.agalactiae <SEQ ID 621> which encodes the amino acid 
sequence <SEQ ID 622>. Analysis of this protein sequence reveals the following: 

Possible site: 14 
5 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3542 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=o . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9515> which encodes amino acid sequence <SEQ ID 9516> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

1 5 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 191 

A DNA sequence (GBSx0197) was identified in S.agalactiae <SEQ ID 623> which encodes the amino acid 
20 sequence <SEQ ID 624>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm — Certainty=0. 3098 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

30 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 192 

A DNA sequence (GBSx0198) was identified in S.agalactiae <SEQ ID 625> which encodes the amino acid 
35 sequence <SEQ ID 626>. This protein is predicted to be rgg protein. Analysis of this protein sequence 
reveals the following: 

Possible site: 59 

»> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0. 3177 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=D. 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA26968 GB:M89776 rgg [Streptococcus gordonii] 
Identities = 74/277 (26%) , Positives = 142/277 (50%) 
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Query: 7 IFREFRMIRQFSLKQVASNELSVSQLSRFERGESDLS1TKFLGALEAIDLSISEFMDRVN 66 

I + R ++ SLK+VA+ ++SV+QLSR+ERG S L++ F L + +S++EF + 
Sbjct: 10 ILKI IRESKNMSLKEVAAGD I SVAQLSRYERGI SSI/FVDS FYSCLRNMSVSLAEFQYVYH 69 

Query: 67 KYQKSDQISLMSQMAQYHYQRDVAGLEKMISVEEGKLKKDSSDIRCRLNIVLFRGMICEC 126 

Y+++D + L ++++ + ++ LE +++ E ++ +LN ++ R + C 

Sbjct: 70 KTCREADDVVLSQKLSFAQRENmVKLESIIAGSEAMAQ^ 129 

Query: 127 DSSRKMSEEDLCFIjSDYIjFQKDSWE I SDYI LIGNLYRYYNTRH I CQLVKEVINQKEYYRD 186 

+ ++ s+ D+ FL+DYLF + W + L N + E+IN+ ++Y + 

Sbjct: 13 0 NPDYQVSKGDIEFLTDYLFSVEEWGRYELWLFTNSVNLLTLETLETFASEMINRTQFYNN 189 

Query: 187 IYTNRNWEATLIJSFVVETLIERRALEE^TFFLEKl'EALLlOffiRNAYHRlILLYEKGFLAY 246 

+ NR + LLNW IE L+ A FL ++ E + Y R+++ Y K +Y 

Sbjct: 190 LPENRRRIIKMLLNWSACIENNHLQVAMKFLNYIDlSrrKIPETDLYDRVLIKYHKALYSY 249 



Sbjct: 250 KVGNPHARHDIEQCLSTFEYLDSFGVARKLKEQFERI 286 

A related DNA sequence was identified in S.pyogenes <SEQ ID 627> which encodes the amino acid 
sequence <SEQ ID 628>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

• Final Results 

bacterial cytoplasm --- Certainty=0. 3792 (Affirmative) < suco 

bacterial membrane --- Certainty= 0.0000 (Not Clear) < suco 

bacterial outside --- Certainty= 0.0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 79/275 (28%) , Positives - 146/275 (52%) , Gaps = 11/275 (4%) 

Query: 9 REFRIaNRQFSLKQVASNELSVSQLSRFERGESDLSLTKFLGALEAIDLSISEFMDRVNKY 68 

R R +Q S+ +A LS SQ+SRFERGES+++ ++ L b+ ++++I EF+ +K 
Sbjct: 15 RRLRKGKQVSISFLADEYLSKSQISRFERGESEITCSRLLNLLDKLNITIDEFVSAHSKT 74 

Query: 69 QKSDQISLMSQMAQYHYQRDVAGLEKMISVEEGKLKKDSSDIRCRLNIVLFRGMICECDS 128 

4 +L+SQ + + +++V L K++ + KD R + +LF DS 
Sbjct: 75 H - TH FFTLLS QARKC YAE KNWKLTKLIJ KDYAHKD YE - - RTM I KAI L F SIDS 123 

Query: 129 SRKMSEEDLCFLSDYLFQKDSWEISDYIJjIGHLYRYYKTRHICQLVKEVINQKEYYRDIY 188 

S S+E+L L+DYLF+ + W + IL+GN R+ N + L KE++ Y 
Sbjct: 124 SIAPSQEELTRLTDYLFKVEQWGTYEIILLGNCSRFMNYNTLFLLTKEMVASFAYSEQNK 183 

Query: 189 TNRNVVEATLIJWVETLIERRALEE»TFFLEKVEAL^^ 248 

TN+ + v +N + 1+ E + + + K++ LL +E N Y + + LY G+ + 
Sbjct: 184 TNKMLVTQLSINCLIISIDHSCFEHSRYLINKIDLLLRDELNFYEKTVFLYVHGYYKLKQ 243 

^ + G + M+QA+ F+ +G +++EH+ ++ 

Sbjct: 244 EEMSGEEDMRQALQIFKYLGEDSLYYSYKEHYRQI 278 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 193 

A DNA sequence (GBSx0199) was identified in S.agalactiae <SEQ ID 629> which encodes the amino acid 
sequence <SEQ ID 63 0>. This protein is predicted to be permease. Analysis of this protein sequence 
reveals the following: 
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Possible site: 15 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



3 N-terminal signal sequence 



Likelihood = 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



140 - 156 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



139 - 157 
340 - 367) 



Certainty=0. 4227 (Affirmative) < succ; 
Certainty=0. 0000 (Not Clear) < suco 
■ Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD36408 GB:AE001788 permease, putative [Thermotoga maritima] 
Identities = 97/396 (24%) , Positives = 194/396 (48%) , Gaps = 15/396 (3%) 

MNINGIKLLSSRAVSKLGDVFyDYGNSTWIASMGGLGQKILGIYQIVELLVSIVLNPFGG 60 
MNN+ S VS+G Y 4W+SG+ + G++ I L +I+++PF G 
MNRNLLLFASGSFVSLIGTRIYQVALAWWLYSKTGSSEYV-GLFMISSFLPAI IVSPFAG 59 

ALADRFQRRKILLITDAICAIM---CFLLSFIGDDKVMVYGLIVANAILAVSNAFSSPAY 117 

+ DR RR ++++ D + ++ FL+ + + + + L++ +4+V ++F +PA 
TVVDRHSRRNMMVVMDILRGVLFMYLFLMEYFSELTMAL- - LLIVTVLVSVFDSFFNPAV 117 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


60 




118 


Sbjct: 


118 




178 


Sbjct: 


178 




230 


Sbj ct: 


236 




295 


Sb j ct : 


296 




355 


Sbjct: 


352 



S +P++V K +++ N+ 



+ P LG L+ G+ +++++++FLIS 4 



IKVE L K K +N+ DI 



Y T++S+ + G +1 



LV L V V 



A related DNA sequence was identified in S.pyogenes <SEQ ID 63 1> which encodes the amino acid 
sequence <SEQ ID 632>. Analysis of this protein sequence reveals the following: 



: 45 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



I-terminal signal sequence 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 98 - 
Transmembrane 347 - 
Transmembrane 154 - 
Transmembrane 284 - 
Transmembrane 378 - 



- 188 ( 161 - 

- 236 { 218 - 

- 327 ( 303 - 

- 114 ( 95 - 

- 363 ( 342 - 

- 170 ( 151 - 
300 ( 281 - 
394 ( 378 - 

90 ( 73 - 

66 ( 49 - 
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Final Results 

bacterial membrane Certainty=0. 4270 (Affirmative) • 

bacterial outside Certainty=0.OO00 (Not Clear) < ! 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < t 

The protein has homology with the following sequences in the databases: 



A++L V+++ +F +P+ + ++V+K+++ + NSL 



K++G+ GV+L++ LSFLI+ 4 





40 


Sbjct: 


36 


Query: 




Sbjct: 


96 




159 


Sbj ct : 


156 




219 


Sb j ct : 






279 


Sbjct: 


273 




339 


Sbjct: 


329 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 136/379 (35%) , Positives = 229/379 (59%) , Gaps = 6/379 (1%) 



RRKILLITDAICAIMCFLLSFIGDDKVMVYGLIVANAILAVSNAFSSPAYKSYIPEIVDK 127 
R+KI++ T+ +C C +LSF+ ++ +VY +++ N I LA +AFSSP+YK++ EIV K 
RKKIIITTNILCGTACLVLSFLTKEQWLVYAIVLTNVILAFMSAFSSPSYKAFTKEIVJCK 130 



N+ LET +1 V+ P++ ++ GI L++D ++FL.L+ L + I 



+EKV 1+ I D+ GF Y+ K I +++AL+N FLA +N LLP++N + 



-TSGAYATILSISAIGSIIGALIARKIKSSINSMLSKLVFSSLGVIVMGFPS- - -LFELP 302 
+4G Y T L+ AIG IGA+++ + ++SM +L S G+4-+M P +F 



h LF+ L++FNI FFS VQ TO ++GRV IFTI I+FMPIGT F ++ 



Query: 


B 


Sbjct: 


11 




68 


Sbj ct : 


71 




128 


Sbj ct : 


131 




188 


Sbj ct : 


191 




247 


Sbjct: 


251 


Query. 


303 


Sbjct: 


311 




363 


Sbjct: 


370 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



Example 194 

A DNA sequence (GBSx0200) was identified in S.agalactiae <SEQ ID 63 3> which encodes the amino acid 
sequence <SEQ ID 634>. This protein is predicted to be membrane permease OpuCD. Analysis of this 
protein sequence reveals the following: 



Possible site: 46 

>>> Seems to have an unoleavable N- 
INTEGRAL Likelihood = -5.68 
INTEGRAL Likelihood = -4.30 
INTEGRAL Likelihood = -3.72 
INTEGRAL Likelihood = -3.19 



:erm signal seq 
Transmembrane 
Transmembrane 



Transmembrane 124 



■ 107 ( 88 - 110) 

• 31 ( 9-37) 

88 ( 72 - 88) 

140 ( 123 - 142) 



- Final Results 

bacterial 
bacterial outside 
bacterial cytoplasm 



-- Certainty=0. 3272 (Affirmative) • 
•-- Certainty=0. 0000 (Not Clear) < i 
-- Certainty=0. 0000 (Not Clear) < : 



A related GBS nucleic acid sequence <SEQ ID 8509> which encodes amino acid sequence <SEQ ID 8510> 

20 was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 1 
McG: Discrim Score: -10.69 
GvH: Signal Score (-7.5): -3.79 
Possible site: 39 
25 »> Seems to have no N- terminal signal sequence 

ALOM program count: 5 value: -9.02 threshold: 0.0 
INTEGRAL Likelihood = - 
INTEGRAL Likelihood = - 
INTEGRAL Likelihood = - 
30 INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 
PERIPHERAL Likelihood = 
modified ALOM score: 2.3 0 



- 51 ( 25 - 53) 

- 167 ( 148 - 170) 

- 91 ( 69 - 97) 

- 148 ( 132 - 148) 

- 200 ( 183 - 202) 



'** Reasoning Step: 3 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



-- Certainty=0.4S09 (Affirmative) < 
-- Certainty=0. 0000 (Not Clear) < £ 
-- Certainty=0. 0000 (Not Clear) < s 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF91342 GB:AF249729 membrane permease OpuCD [Listeria monocytogenes] 
Identities = 104/154 (67%) , Positives = 133/154 (85%) 

Query: 3 IANVIQTIPSLAMISIIMLGLGLGIKT^/VATVFLYSLLPIITNTYTGIRNVDSDLLDAAK 62 

IAN+IQTIP+LAM++++ML +GLG TW ++FLYSLLPI+ NTYTGIRNVD LL+4 K 
Sbjct: 60 IANIIQTIPALSMIAVLMLIMGLGTNTV^ 119 

Query: 63 GMG^KRQRLF^LPLSISVIMAGI^RNALVmiGITAIGAFVGGGGLGDIIIRGTNATN 122 

MGMTK Q L ++E+PL++SVIMAG+RNALV+AIG+ AIG FVG GGLGDII+RGTNATN 
Sbjct: 120 AMG^KWQvLRLIEMPLALSVIMAGIRNALVIAIGVAAIGTFVGAGGLGDIIVRGTNATN 179 

Query: 123 GGAI ILAGSLPTALMAI FSDLILGGIQRMLEPRK 156 

G AIILAG++PTA+MAI +D++LG ++R L P K 
Sbjct: 180 GTAI ILAGAI PTAVMAILADVLLGWVERTLNPVK 213 



A related DNA sequence was identified in S.pyogenes <SEQ ID 635> which encodes t 
sequence <SEQ ID 636>. Analysis of this protein sequence reveals the following: 
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: 49 

5 N-terminal signal sequence 



Trarismenibrane 39 ■ 

Transmembrane 190 ■ 

Transmembrane 93 ■ 

Transmembrane 7 6 ■ 

Transmembrane 221 - 



INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 

5 INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 

10 Final Results 

bacterial membrane Certainty=0. 4694 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the databases: 



Query: 13 MPSLFVTFQNRFNEIVLAALGEHLQISLLSLMIALLIGVPLAALLSRSKRWSDIMLQVTGV 72 

M +L TFQ+RF++WL AL +HLQ+SLL+L++A+L+ +PLA L ++ +D +LQ+ G+ 
Sbjct: 1 MTNLIATFQDRFSDWLTALSQHLQLSLLTLLIAILLAIPIAVFLRYHEKLADWVLQIAGI 60 

Query: 73 FQTIPSLALLGLFIPLMGIGTLPAVTALVIYAIFPILQNTITGLNGIDPSLVEAGIAFGM 132 

FQTIPSLALLGLFIPLMGIGTLPA+TALVIYAIFPILQNTITGL GIDP+L EAGIAFGM 
Sbjct: 61 FQTIPSLALLGLFIPLMGIGTLPALTALVIYAIFPILQNTITGLKGIDPNLQEAGIAFGM 120 

Query: 133 TKWERLKTFEIPIAMPVIMSGVRTSAVMIIGTATLASLIGAGGLGSFILLGIDRNNANLI 192 

T+WERLK FEIP+AMPVIMSG+RT+AV+IIGTATLA+LIGAGGLGSFILLGIDRNNA+LI 
Sbjct: 121 TRWERLKKFEIPLAMPVIMSGIRTAAVLIIGTATLAALIGAGGLGSFILLGIDRNNASLI 180 

Query: 193 LIGAISSALIiAIIFNSLLQYLEKASLRRIMISFGITLLALLASYTPMALSQFSKGKDTW 252 

LIGA+SSA+IiAI FN LL+ +EKA LR I F + L L SY+P L Q K K+ +V 
Sbjct: 181 LIGALSSAVLAIAFNFLLKVMEKAKLRTIFSGFALVALLLGLSYSPALLVQ--KEKENLV 238 

Query: 253 IAGKLGAEPDILINLYKELIEDQSDISVELKSNFGKTSFLYEALKSGDIDMYPEFTGTIT 312 

IAGK+G EP+IL N+YK LIE+ + ++ +K NFGKTSFLYEALK GDID+YPEFTGT+T 
Sbjct: 239 IAGKIGPEPEIIANI^YKTilEENTSMTATVKPNFGKTSFLYEALKKGDIDIYPEFTGTVT 298 

Query: 313 SSLLRDKPPLSNDPKQVYEDAKKGIAKQDKLTLLKPFAYQNTYAVAMPEKLAKEYQIETI 372 

SLL+ P +S++P+QVY+ A4 GIAKQD L LKP +YQNTYAVA+P+K+A+EY ++TI 
Sbjct: 299 ESLLQPSPKVSHEPEQVYQVARDGIAKQDHLAYLKPMSYQNTYAVAVPKKIAQEYGLKTI 358 

Query: 373 SDLKAHADTLKAGFTLEFKDRADGYKGMQSQYGLQLSVATMEPALRYQAIQSGDIQVTDA 432 

SDLK LKAGFTLEF DR DG KG+QS YGL L+VAT+EPALRYQAIQSGDIQ+TDA 

Sbjct: 359 SDLmCVEGQLKAGFTLEFNDREDGNKGLQSMYGLNLNVATIEPALRYQAIQSGDIQITDA 418 

Query: 433 YSTDAEITKYHLKVLKDDKQLFPPYQGAPLMKTSLLTKHPELKGILNQLAGKITEKEMQD 492 

YSTDAE+ +Y L+VL+DDKQLFPPYQGAPLMK +LL KHPEL+ +LN LAGKITE +M 
Sbjct: 419 YSTDAELERYDLQVLEDDKQLFFPYQGAPLMKEALLKKHPELERVLNTLAGKITESQMSQ 478 

Query: 493 MNYEVSVKGADANKVARDYLLKTGLIQK 520 

+NY+V V+G A +VA+++L + GL++K 
Sbjct: 479 LNYQVGVEGKSAKQVAKEFLQEQGLLKK 506 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 53/148 (35%), Positives = 93/148 (62%), Gaps = 1/148 (0%) 

Query: 3 IANVIQTI PSIiAMISI IMLGLGLGIKTVVATVFLYSLLPI ITNTYTGIRNVDSDLLDAAK 62 

+ V QTIPSLA++ + + +G+G V + +Y++ PI+ NT TG+ +D L++A 
Sbjct: 69 VTGVFQTIPSLALLGLFIPLMGIGTDPAVTALVIYAIFPILQNTITGLNGIDPSLVEAGI 128 

Query: 63 GMGMTI<RQRLFMvELPLSISVIMAGLRNALWAIGITAIGAFVGGGGLGDIIIRGTIIATN 122 

GMTK +RL E+P+++ VIM+G+R + V+ IG + + +G GGLG 1+ G + N 
Sbjct: 129 AFGMTKWERLKTFEIPIAMPVIMSGVRTSAVMIIGTATLASLIGAGGLGSFILLGIDRNN 188 

Query: 123 GGAIILAGSLPTALMAIFSDLILGGIQR 150 



WO 02/34771 



PCT/GB01/04789 



+IL G++ +AL+AI + +L +++ 
Sbjct: 189 AN-LILIGAISSALLAIIFNSLLQYLEK 215 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 195 

A DNA sequence (GBSx0201) was identified in S.agalactiae <SEQ ID 637> which encodes the amino acid 
sequence <SEQ ID 63 8>. This protein is predicted to be choline transporter-related. Analysis of this protein 
sequence reveals the following: 

Possible site: 44 

>>> May be a lipoprotein 

INTEGRAL Likelihood = -3.03 



Final Results 

bacterial membrane Certainty=0 .2211 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9517> which encodes amino acid sequence <SEQ ID 9518> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB15386 GB:Z99121 glycine betaine/carnitine/choline ABC 

transporter (osmoprotectant -binding protein) [Bacillus subtilis] 
Identities = 168/303 (55%) , Positives = 224/303 (73%) , Gaps = 1/303 (0%) 

Query: 2 LKKSHFLQI FTLCLALLTI SGCQLTDTKKSGHTTIKVAAQSSTESSIMANI ITELIHHEL 61 

+ K +L F L +L + GC L + TIK+ AQS TES I+AN+I +LI H+ 

Sbjct: 1 MTKIKWLGAFALVFVML-LGGCSLPGIjGGASDDTIKIGAQSMTESEIVANMIAQLIEHDT 59 

Query: 62 GYNTTLISNIGSSTVTHQALLRGDADIAATRYTGTDITGTLGLKAVKDPKEASKIVKTEF 121 

NT L+ NLGS+ V HQA+L GD DI+ATRY+GTD+T TLG +A KDPK+A IV+ EF 
Sbjct: 60 DIOTALVKI^Sim/QHQAMLGGDIDISATRYSGTDLTSTLGKEAEKDPKKALNIVQNEF 119 

Query: 122 QKRYNQTWYPTYGFSDTYAFMVTKE FARQNKITKI SDLKKLSTTMKAGVDS SWMNREGDG 181 

QKR++ W+ +YGF +TYAF VTK+FA + I +SDLKK ++ K GVD++W+ R+GDG 
Sbjct: 120 QKRFSYKWFDSYGFDNTYAFTVTKKFAEKEHINTVSDLKKNASQYKLGVDNAWLKRKGDG 179 

Query: 182 YTDFAKTYGFEFSHIYPMQIGLVYDAVESNKMQSVLGYSTDGRISSYDLEILRDDKKFFP 241 

Y F TYGFEF YPMQIGLVYDAV++ KM +VL YSTDGRI +YDL+ 1 L+DDK+ FFP 
Sbjct: 180 YKGFVSTYGFEFGTTYPMQIGLVYDAVKNGKMDAVLAYSTDGRIKAYDLKILKDDKRFFP 239 

Query: 242 PYEASIWVNNSIIKKDPKLKKLLHRLDGKINLKTMQNIjNYMVDDKLLEPSVVAKQFLEKN 301 

PY+ S V+ ++K+ P+L+ ++++L G+I+ +TMQ LNY VD KL EPSWAK+FLEK+ 
Sbjct: 240 PYDCSPVIPEKVLKEHPELEGVINKLIGQIDTETMQEUreEVDGKLKEPSVVAKEFLEKH 299 

Query: 302 HYF 304 
HYF 

Sbjct: 300 HYF 302 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 851 1> and protein <SEQ ID 8512> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: 22 Crend: 5 
McG: Discrim Score: 10.26 
GvH: Signal Score (-7.5): -4.19 
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Possible site: 44 
>>> May be a lipoprotein 

ALOM program count: 0 value: 3.65 threshold: 0.0 
PERIPHERAL Likelihood =8.65 66 
modified ALOM score: -2.23 

Final Results 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

56.3/75.4% over 287aa 

Bacillus subtilis 

EGAD 1 109208 | glycine betaine/camitine/choline ABC Insert characterized 
SP|032243|OPCC_BACSU GLYCINE BETAINE/ CARNITINE/ CHOLINE-BINDING PROTEIN 
■ (OSMOPROTECTANT-BINDING 

PROTEIN) . Insert characterized 

GP|2635894|emb|CAB15386.l| |Z99121 glycine betaine/ carnitine/choline ABC transporter 
(osmoprotectant -binding protein) Insert characterized 

PIR[E69670|E69670 glycine betaine/carnitine/choline ABC transporter (osmoprotec) opuCC - 
Insert characterized 

ORF0118K349 - 1212 of 1524) 

EGAD|109208|BS3376(15 - 302 of 303) glycine betaine/carnitine/choline ABC {Bacillus 
subtilis} SP|032243|OPCC_BACSU GLYCINE BETAINE/ CARNITINE/ CHOLINE-BINDING PROTEIN PRECURSOR 
(OSMOPROTECTANT-BINDING PROTEIN). GP | 2635894 | emb | CAB15386 . 1 1 | Z99121 glycine 

betaine/carnitine/choline ABC transporter (osmoprotectant -binding protein) {Bacillus 
subtilis} PIR|E69670|E69670 glycine betaine/carnitine/choline ABC transporter (osmoprotec) 
opuCC - Bacillus subtilis 
%Match =33.5 

%Identity =56.2 %similarity =75.3 

Matches = 162 Mismatches - 71 Conservative Sub.s = 55 



192 



252 



282 



312 



VWFFLIVF*QCLIFIFSWYKSGSMKRIWGVXXN*LXXITGNSSNAQN^ 

MTKIKWLGAFALVFVMLLGGCS 



LTDTKKSGHTTIKVAAQSSTESSIMANIITELIHHELGYNTTLISNIiGSSTVTHQALLRGDADIAATRYTGTDITGTLGL 

| : |||: HI HI [ = || = | =11 |= II 1 = lllh | |||:| || 11=1111=111=1 III 

LPGLGGASDDTIKIGAQSMTESEIVANMIAQLIEHDTDLNTALVKNLGSNYVQHQAMLGGDIDISATRYSGTDLTSTLGK 



KAVKDPKEASKIVKTEXQKRYNQTWYPTYGFSDTYAF^IVTKEFARQNKITKISDL^CKLSTTMKAGVDSSWMNREGDGYTD 

:| ||||:| ||: | ||| = : |: :||| =1111 I HI : | :||||| = = I 111-1= 1 = 1111 
EAEKDPKKALNIVQNEFQKRFSYKVIFDSYGFDNTYAFTVTKKFAEKEHINTVSDLKKNASQYKLGTONA 

120 130 140 150 160 170 180 

. 882 912 942 972 1002 1032 1062 1092 

FAKTYGFEFSHIYPMQIGLvYDAATESNKMQSVLGYSTDGRISSYDLEILRDDKKFFPPYFASMVVNNSIIK3<DPKLKlC[jL 

i nun illinium- n =11 iiimm =iihihiii = iiin= i m -1= 1 = 1= - 



200 



210 



220 



230 



24C 



2:>o 



260 



1122 1152 1182 1212 1242 1272 1302 1332 

HRLDGKINLKTMQNLNYMVDDKLLEPSVVAKQFLEKI^ 

= = | 1 = 1= =111 III II II IIIIIIIMMhlll 
NKLIGQIDTETMQELNYEVDGKLKEPSWAKEFLEKHHYFD 



65 SBQ ID 8512 (GBS23) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 14 (lane 8; MW 35kDa). 
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The GBS23-His fusion product was purified (Figure 194, lane 9) and used to immunise mice. The resulting 
antiserum was used for Western blot (Figure 251). These tests confirm that the protein is immunoaccessible 
on GBS bacteria. 

Example 196 

A DNA sequence (GBSx0202) was identified in S.agalactiae <SEQ ID 639> which encodes the amino acid 
sequence <SEQ ID 640>. This protein is predicted to be membrane permease OpuCB (opuBB). Analysis of 
this protein sequence reveals the following: 



z> N- terminal signal sequence 



INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = -4 
Likelihood = -4 
Likelihood = -1 



membrane 25 - 41 

Transmembrane 182 - 198 

Transmembrane 61 - 77 

Transmembrane 78 - 94 

Transmembrane 134 - ISO 



134 - 150: 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



■- Certainty=0 .4864 (Affirmative) < suco 
-- Certainty= 0.0000 (Not Clear) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT 

9 membrane permease OpuCB [Listeria 
= 160/208 (76%) 

Query: 



Sbjct 



MWJFLSQYGMQILVKTWEQVYISFFAIALGIArAVPLGWLTRFPKVAKIIIAIASMLCJT 60 
+V F + G +LV+TW+ ++IS A+ LGIA+AVP G++LTR PKVA +1 + S+LQT 
IVTFFQENGHNLLVQTOQHLFISLSAVIIiGIAV^ 63 



30 Query: 61 ipslallalmiplfgigkipaivalfiysllpilrntyigmnnvnptlkdcaxgmgmkpi 120 

+PSLA+LA +IP G+G +PAI4ALFIY+LLPILRNT+IG+ V+ L + +GMGM 
Sbjct: 64 VPSIAILAFIIPFLGVGTLPAIIALFIYALLPIIiRNTFIGVRGVDKNLIESGRGMGMTNW 123 

Query: 121 QSIFQVELPLATPIIMAGIRLSTIYVIAWATLASYIGAGGLGDLIFSGLNLFQSKLILGG 180 
35 Q I VE+P + +IMAGIRLS +YVIAWATLASYIGAGGLGD IF+GLNL++ LILGG 

Sbjct: 124 QLIVNVEIPNSISVIMAGIRLSAVYVIAWATIASYIGAGGLGDFIFNGLNLYRPDLILGG 183 

Query: 181 TIPVIILSLIIDYLLGLLETALTPRTTR 208 
IPV IL+L++++ LG LE LTP+ R 
40 Sbjct: 184 AIPVTILALWEFALGKLEYRLTPKAIR 211 



A related GBS gene <SEQ ID 8513> and protein <SEQ ID 8514> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 0 
McG: Discrim Score: -9.08 
GvH: Signal Score (-7.5): -1.86 

Possible site: 37 
>» Seems to have no N-terminal 
ALOM program count : 5 value : 
INTEGRAL Likelihood = -8.60 
INTEGRAL Likelihood = -7.96 
INTEGRAL Likelihood = -4.83 
INTEGRAL Likelihood = -4.09 

INTEGRAL Likelihood = -1.22 Transmembrane 134 - 150 
PERIPHERAL Likelihood = 2.70 156 
modified ALOM score : 2.22 



.gnal sequence 
1.60 threshold: C 
Transmembrane 



'** Reasoning Step: 3 
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Final Results 

bacterial membrane Certainty=0 .4439 (Affirmative) < auco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 

The protein has homology with the following sequences in the databases: 

ORF01825(301 - 927 of 1233) 

GP| 9651976 |gb|AAF91340.l|AF249729_2|AF249729 (4 - 212 of 218) membrane permease OpuCB 
{Listeria monocytogenes} 
10 %Match =30.2 

%Identity =57.9 %Similarity =79.9 

Matches = 121 Mismatches = 42 Conservative Sub.s = 46 

117 147 177 207 237 267 297 327 

15 STCF*YLKTY*FLCYGRRLT*KVC*AYFKTWFK1RSSC*?*E*LKGHCYSCIPS*YVIRYYLGRY*NGGSIMVNFLSQYG 

=1 I: : I 
MDAIVTFFQENG 



20 357 387 417 447 477 507 537 567 

MQILVKTWEQWISFFAIALGIA1AVPXGVVLTRFPKVAKIIIAIASMLQTIPSLALLALMIPLFGIGKIPAIVALFIYS 
:|| = ||::::||: 1= lllhlll |: = ll| Mil =1 = I = I I 1 = I I I I : I I : = I I = = I = I =111 = 11111: 
HNLLVQTWQHLFISLSAVILGIAVAVPTGII^TRSPKVANFVIGWSVLQTVPStAILAFIIPFLGVGTLPAIIABFIYA 
30 40 50 60 70 80 90 

25 

597 627 657 687 717 747 777 807 

LLPILRNTYIGMNNVNPTLKDCAKGMGMKPIQSIFQVELPLATPIIMAGIRLSTIYVIAWATLASYIGAGGLGDLIFSGL 
lllllllhlh 1= I = II 11 = 1 = =11111111 = I I I I I I I I I I I I I I I I I I I = I I = I I 

LLPILRNTFIGVRGVDKNLIESGRGMGMTNWQLIVNVEIPNSISVI^GIRLSAVYVIAWATLASYIGAGGLGDFIFNGL 
30 110 120 130 140 150 160 170 

837 867 897 927 957 ' 987 1017 1047 

NLFQSKLILGGTIPVIILSL1IDYLLGLLETALTPRTTRREA*ICLKNRTFYRYLHFA*PS*RFLWN*PILKSLVIPQL 

ll» Mill Ml 11 = 1 = = = = II II 111= I 

35 NLYRPDLILGGA1PVTILALWEFALGKLEYRLTPKAIREAREGGE 
190 200 210 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 197 

40 A DNA sequence (GBSx0203) was identified in S.agalactiae <SEQ ID 641> which encodes the amino acid 
sequence <SEQ ID 642>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

>» Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 3531 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF91339 GB:AF249729 ATPase OpuCA [Listeria monocytogenes] 
Identities = 230/380 (60%), Positives = 298/380 (77%), Gaps = 4/380 (1%) 

Query: 6 IIEYQNINKVY-GENVAVEDINLKIYPGDFVCFIGTSGSGKTTLMRMVNHMLKP1NGTLL 64 
55 +++++++ K Y G AV D+ L I G+FVCFIG SG GKTT M+M+N +H-+PT G + 

Sbjct: 1 MLKFEHOTKTYKGGKKAVNDLTLNIDKGEFVCFIGPSGCGKTTTMKMINRLIEPTEGKIF 60 

Query: 65 FKGKDISTINPIELRRRIGWIQNIGLMPHMTIYENIVLVPKLLKWSEEAKRAKARELIK 124 
KDI +P++LRR IGYVIQ IGLMPHMTI ENIVLVPKLLKWSEE K+ +A+ELIK 
60 Sbjct: 61 INDKDIMAEDPVKLP1JSIGYVIQQIGLMPHMTIRENIVLVPKLLKWSEEKKQERAKELIK 120 
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125 


Sbj ct : 


121 


Query: 


185 


Sbjct: 


181 


Query: 


245 


Sbjct- 


241 


Query: 


304 


Sbj ct : 


301 


Query: 


364 


Sbjct: 


361 



LV+LPEE+LDRYP ELSGGQQQRIGV+RALAA+Q++ILMDEPFGALDPITR+ +Q+ K+ 



LQ+E+GKTII VTHDMDEA+KIA +I4+M +G++VQ TP+++I, +PA SFVE IG++R 



L+ A+ D+T V QIM NPVSITA+K+L AIT+M++KRVD+LLV D G ++ GFID+E 



V DI++ FYV ED LLR+T +RILK G KY PWD + L GIVTRAS 



LVD++YD IWG E TE+Q 



A related DNA sequence was identified in S.pyogenes <SEQ ID 643> which encodes the amino acid 
sequence <SEQ ID 644>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3619 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 102/237 (43%), Positives = 165/237 (69%), Gaps = 1/237 (0%) 



Query: 66 KGKDISTINPIELRRRIGWIQNIGLMPHMTIYENIVLVPKLLKWSEEAKRAKARELIKL 125 

+ ++ E+R IGYV+Q I L P+4-T+ ENI ++P++ +WS E R K EL+ 
Sbjct: 61 NNVPQTELDLREMRLS IGYVLQQIALFPNLTVAENIAI I PEMKQWSAEEIRQKTEELLDK 120 



Query: 185 LQEEMGKTIILVTHDMDEALKLATKIIVT'!DNGKKVQEGT'P2vDLLHHPATSFVEQMIG 241 

L +E TI+ VTHD+DEA+KL ++ +++ G++VQ P + HPA +FV + G 
Sbjct: 181 LHKEFDMTIVFVTHDIDEAIKI^-DRVAILNEGEIVQLDRPEMIKTHPANAFVVNLFG 237 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 198 

A repeated DNA sequence (GBSx0212) was identified in S.agalactiae <SEQ ID 645> which encodes the 
amino acid sequence <SEQ ID 646>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4736 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Wot Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 199 

A DNA sequence (GBSx0213) was identified in S.agalactiae <SEQ ID 647> which encodes the amino acid 
sequence <SEQ ID 648>. Analysis of this protein sequence reveals the following: 

10 Possible site: 3B 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.06 Transmembrane 18- 34 ( 18- 34) 

Final Results 

15 bacterial membrane --- Certainty=0. 1426 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

20 A related GBS gene <SEQ ID 8515> and protein <SEQ ID 8516> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop: Possible site: 20 Crend: 5 

Sequence Pattern: CQMN 
SRCFLG: 0 
25 McG: Length of UR: 19 

Peak Value of UR : 2.60 
Net Charge of CR: 3 
McG: Discrim Score: 7.77 
GvH: Signal Score (-7.5): -4.89 
30 Possible site: 25 

»> May be a lipoprotein 

Amino Acid Composition: calculated from 21 
ALOM program count: 0 value: 13.21 threshold: 0.0 
PERIPHERAL Likelihood = 13.21 115 
35 modified ALOM score : -3.14 

*** Reasoning Step: 3 

Final Results 

40 bacterial membrane --- Certainty=0 .0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

45 ORF01527(346 - 455 of 1095) 

EGAD | 7398 | 7198 (2 - 41 of 47) lysis protein for colicin e9 precursor {Escherichia coli} 
EGAD| 41475 1 43808 lysis protein { } SP|P13344 |LYS5_EC0LI LYSIS PROTEIN FOR COLICIN E5 
PRECURSOR. GP|40543|emb|CAA33861.l| |X15857 lysis protein (AA 1-47) {Enterobacteriaceae} 
GP| 144373 |gb|AAA98053.1 | |M30445 colicin release protein {Plasmid CO1E5-099} 

50 PIR| JQ0330] JQ033 0 colicin E5 lysis protein precursor - Escherichia coli plasmid ColE5-099 

%Match =3.7 

%Identity =35.0 %Similarity =52.5 

Matches = 14 Mismatches = 19 Conservative Sub.s = 7 

55 135 165 195 225 255 285 315 345 

YI YFFHCRRIYI I ININY* FN*GI *NIQMIFCLHVKTIO , IKIRENFVILKLIL*CW* I IVNFI I YLI YKT YILRKENMMR 
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KyiKWLIPISIFGMILGGCQMNSEHKIQSNEVKNSKQSSVKKDKKMTKKEQLAYLKEHEQEIIDWKlHNNQIESVQFDW 
I I 1 = 1 : : =11 II I I =1 II :h 
KKITWIILLLLAAIILAACQANYIHDVQGGTVSPSSSAELTGLaTQ 



10 SEQ ID 8516 (GBS389) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 74 (lane 6; MW 18kDa). 

The GBS389-His fusion product was purified (Figure 214, lane 4) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 313), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

15 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 200 

A DNA sequence (GBSx0214) was identified in S.agalactiae <SEQ ID 649> which encodes the amino acid 

sequence <SEQ ID 650>. Analysis of this protein sequence reveals the following: 

20 Possible site: 19 

>>> Seems to have no N-termitial signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3765 (Affirmative) < suco 

25 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

30 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 201 

A DNA sequence (GBSx0215) was identified in S.agalactiae <SEQ ID 651> winch encodes the amino acid 
sequence <SEQ ID 652>. Analysis of this protein sequence reveals the following: 

35 Possible site: 4S 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3882 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

45 Based on this analysis, it was predicted mat this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 202 

A DNA sequence (GBSx0216) was identified in S.agalactiae <SEQ ID 653> which encodes the amino acid 
sequence <SEQ ID 654>. This protein is predicted to be lectin, alpha subunit precursor. Analysis of this 
protein sequence reveals the following: 

5 Possible site: 47 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0653 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

15 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 203 

A DNA sequence (GBSx0217) was identified in S.agalactiae <SEQ ID 655> which encodes the amino acid 

sequence <SEQ ID 656>. Analysis of this protein sequence reveals the following: 

20 Possible site: 41 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 6569 (Affirmative) < suco 

25 bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
30 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 204 

A DNA sequence (GBSx0218) was identified in S.agalactiae <SEQ ID 657> which encodes the amino acid 
sequence <SEQ ID 658>. Analysis of this protein sequence reveals the following: 

35 Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .5736 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

45 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 205 

A DNA sequence (GBSx0219) was identified in S.agalactiae <SEQ ID 659> which encodes the amino acid 
sequence <SEQ ID 660>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-13.11 Transmembrane 146 - 162 



INTEGRAL Likelihood =-12 

INTEGRAL Likelihood = -9. 

INTEGRAL Likelihood = -7 . 

INTEGRAL Likelihood = -6. 

INTEGRAL Likelihood = -3 



90 Transmembrane 13 - 29 

50 Transmembrane 108 - 124 

75 Transmembrane 40 - 56 

64 Transmembrane 177 - 193 

3 5 Transmembrane 77 - 93 



138 - 170 
9 - 32 
104 - 129] 

33 - 6i; 

170 - 195 
77 - 97 



Final Results 

bacterial membrane Certainty=0 . 6243 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8517> which encodes amino acid sequence <SEQ ID 8518> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 206 

A DNA sequence (GBSx0220) was identified in S.agalactiae <SEQ ID 661> which encodes the amino acid 
sequence <SEQ ID 662>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2374 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB89623 GB:AE000990 repressor protein [Archaeoglobus 
fulgidus] 

Identities <= 34/62 (54%) , Positives = 46/62 (73%) 

Query: 11 LKQVREDIGMTQQELAIRIGVRRETIGHLENNRYNPSLEMALKIVKIFDMKIEDIFQLRK 70 

+K+ R MTQ+ELA R+GVRRETI LE +YNPSL++A KI ++F+ KIEDIF + 
Sbjct: 5 IKEFRAKFOTTQEELAKRVGVRRETIVFLEKGKYNPSLKLAYKIARVFNAKIEDIFIFDE 64 

Query: 71 ED 72 
E+ 

Sbjct: 65 EE 66 

There is also homology to SEQ ID 412. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
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Example 207 

A DNA sequence (GBSx0221) was identified in S.agalactiae <SEQ ID 663> which encodes the amino acid 
sequence <SEQ ID 664>. Analysis of this protein sequence reveals the following: 

> N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty= 0.3 7 94 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB61817 GB:AL133236 putative acetyl transferase [Streptomyces 
coelicolor A3 (2)] 

Identities = 30/97 (30%), Positives = 52/97 (52%), Gaps = 1/97 (1%) 

Query: 82 VGMLNIWLARADMQWGELGYVFHNQFWSNGYAFESILALLNSTYKKLGFHHIEAQITPG 141 

VGM 4+ + Q GE+ Y+ H + W G E +LL+ +4+ G H I A P 
Sbjct: 72 VGMGDLHVRSHTQRQ-GEISYIVHPRVWGOGIGTEIGRSLLSLGFDRWGLHRIRATCDPR 130 

Query: 142 NERSEKLVRRLGLTYETTRKDFSFENGKWTDKLIYSI 178 

N+ S +++ +LG+TYE + ++ WD L++SI 
Sbjct: 131 NQASSRVLTKLGMTYEGRHRHTAWIRDGWRDSLVFSI 167 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 208 

A DNA sequence (GBSx0222) was identified in S.agalactiae <SEQ ID 665> which encodes the amino acid 
sequence <SEQ ID 666>. This protein is predicted to be p20 protein. Analysis of this protein sequence 
reveals the following: 

no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0. 1044 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA30415 GB:X07542 P20 (AA 1-178) [Bacillus licheniformis] 
Identities = 56/175 (32%) , Positives = 94/175 (53%) , Gaps = 6/175 (3%) 

Query: 16 TVLTERLRLQPVELTNvNDFLEFSSDSETVFYMQRYKANTVEEAQVVIiA NVCMKSPL 72 

45 T+ TERL L+ +EL + + ++ SD E YM V +A+ ++ ++ ++ 

Sbjct: 3 TLYTERLTLRKMELEDADVLCQYWSDPEVTKYMLIITPFTDVSQARDMIQMINDIjSIjEGQA 62 

Query: 73 GIYAMIEKESQKMIGIIELEIRDEFS--AEFGYIIiNKNYNGKGYMTEACSKLMSIGFEHL 130 
+++I KE+ ++IG + D+ + AE GY L +Nt- GKG+ +EA KL+ GF L 

50 Sbjct: 63 NRFSIIVKETDEVIGTCGFNMIDQENGRAEIGYDLGRNHWGKGFASEAVQKLIDYGFTSL 122 

Query: 131 DLERIYARFDINNKKSGNVMERIGMKI<EGELRHLAKInI?KGEWKTRAYYSILKEEY 185 

+L Rl A+ + N S ++ + +KEG LR K KG +S+BK EY 

Sbjct: 123 NLNRIEAKVEPENTPSIKLLNSLSFQKEGLLRDYEK-AKGRLIDVYMFSLLKREY 176 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 209 

A DNA sequence (GBSx0223) was identified in S.agalactiae <SEQ ID 669> which encodes the amino acid 
sequence <SEQ ID 670>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

»> SeerriG to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .5180 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 {Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MGDWENFTEGKHPKIDTLNGKTVRIEKINPD-HFEDLFQVYGELSTEDSLTYISFSKFN 59 

+G VE +T P+ L G T R+E ++ + H +LF Y E + TY+ F 
Sbjct: 11 VGADWGOTTRAFPEKVVLKGNTCRLEPLDRERHGSELFSAYSEAG-QKLWTYLPAGPFT 69 

Query: 60 SK^FDVFFQTLIjKSEDPYYIAIVDNNTGKVICTFSLMRIDTKNRWEMGWVVYSSKLKQ 119 

+ E+ F + L +++D AI++ T + +GT L+RID N +E+G+W+S +L++ 
Sbjct: 70 NLEEYLEFIKELNETKDTVPFAIINKETERAVGTLCLIRIDEANGSLEVGYWFSPELQK 129 

Query: 120 TRIATEAQYLVMKYVFEELCYRRYEWKCDSLNAPSNNSAKRLGFTFEGTFRQAWYKGRN 179 

T IATEAQ+L+MKYVF++L YRRYEWKCDSLN PS +A RLGF +EGTFRQ WYK3R 
Sbjct: 130 TIIATEAQFLLMKYVFDDLQYRRYEWKCD8LNGPSRRAAMRLGFKYEGTFRQVWYKGRT 189 

Query: 180 RDTNWYSILDKEWPEKKTRFEKWLDDSNFAVNGYQIRSLSSIEQ 223 

ROT W+SI+DKEW + FE+WLD +NF NG Q R +++I + 
Sbjct: 190 RDTQWFSIIDKEWLRIRKTFEEWLDKTNFE-NGKQKRGIAAIRE 232 

No corresponding DNA sequence was identified in S. pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful 
vaccines or diagnostics. 

Example 210 

A DNA sequence (GBSx0224) was identified in S.agalactiae <SEQ ID 671> which encodes the amino acid 
sequence <SEQ ID 672>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-12.15 Transmembrane 25 - 41 ( 20 - 49) 



Final Results 

bacterial membrane Certainty=0 . 5861 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8519> and protein <SEQ ID 8520> were also identified. Analysis of this 
protein sequence reveals the following: 
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Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: -3.31 
GvH: Signal Score (-7.5): -4.44 
Possible site: 39 
5 >>> Seems to have no N-terminal signal sequence 

ALOM program count: 1 value: -12.15 threshold: 0.0 

INTEGRAL Likelihood =-12.15 Transmembrane 25 - 41 ( 20 - 49} 
PERIPHERAL Likelihood = 11.94 59 
modified ALOM score : 2.93 

10 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 58S1 (Affirmative) < suco 

15 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 672 (GBS43) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 5 (lane 4; MW 34kDa). It was also expressed in E.coli as a GST-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 13 (lane 9; MW 58kDa) and in Figure 15 (lane 
20 4; MW 59kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 211 

A DNA sequence (GBSx0225) was identified in S.agalactiae <SEQ ID 673> which encodes the amino acid 

25 sequence <SEQ ID 674>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> May be a lipoprotein 

Final Results 

30 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9519> which encodes amino acid sequence <SEQ ID 9520> 
35 was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 212 

A DNA sequence (GBSx0226) was identified in S.agalactiae <SEQ ID 675> which encodes the amino acid 
sequence <SEQ ID 676>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

»> Seems to have no N-terminal signal sequence 
45 INTEGRAL Likelihood = -1.54 Transmembrane 165 - 181 ( 154 - 181) 

INTEGRAL Likelihood = -0.85 Transmembrane 67 - 83 ( 67 - 84) 

Final Results 

bacterial membrane Certainty=0. 1617 (Affirmative) < suco 

50 ' bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA82211 GB:Z28353 similar to a B.subtilis gene (GB: 
BACHEMEHY_5) [Clostridium pasteurianum] 
Identities = 40/185 (21%) , Positives = 87/185 (46%) , Gaps = 6/185 (3%) 



Query: 


18 


Sbjct: 


1 




78 


Sbjct: 


61 




138 


Sbjct: 


121 




194 


Sbjct: 


179 



f ++N + +N+ 



No corresponding DNA sequence was identified in S. pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 213 

A DNA sequence (GBSx0227) was identified in S.agalactiae <SEQ ID 677> which encodes the amino acid 
sequence <SEQ ID 678>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 23 B9 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 214 

A DNA sequence (GBSx0228) was identified in S.agalactiae <SEQ ID 679> which encodes the amino acid 
sequence <SEQ ID 680>. Analysis of this protein sequence reveals the following: 

Possible site: 





have no N-terminal signal sequence 










INTEGRAL 


Likelihood = 


13 


32 


Transmembrane 


341 


357 


333 


361 


INTEGRAL 


Likelihood = 


10 


93 


Transmembrane 


253 


269 


238 


277 


INTEGRAL 


Likelihood = 


10 


77 


Transmembrane 


172 


188 


166 


196 


INTEGRAL 


Likelihood = 


-8 


CI 


Transmembrane 


225 


241 


215 


251 


INTEGRAL 


Likelihood = 


-7 


CI 


Transmembrane 


21 


37 


18 


42 


INTEGRAL 


Likelihood = 


-2 


66 


Transmembrane 


285 


301 


283 


301 



Final Results 



WO 02/34771 



PCT/GB01/04789 



bacterial membrane Certainty=0 . 6328 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the GENPEPT database: 

=>GP:CAB426S4 GB:ftL049819 putative integral membrane protein 
[Streptomyces coelicolor A3 (2) ] 
Identities = 60/156 (38%) , Positives = 101/156 (64%) , Gaps = 1/156 (0%) 

10 Query: 176 LMGFMVFFFVFLISGMALLKERTSGTLDRLLATPVKRSDIVFGYMLSYGIIAIIQTIVIV 235 

L+G +FL++ +A L+ERTSGTL+RLLA P+ + D++ GY L++G LAI+Q+ + 

Sbjct: 77 LLGIFPLITMFLVTSIATLRERT3GTLERLIAMPLGKGDLIAGYALAFGALAIVQSALAT 136 

Query: 236 LSTIWLLDIQWGSIFSVIIVNFILALVALSLGILMSTLAKSEFQMMQFIPLIIMPQLFF 295 
15 +W L + V GS + +++V + AL+ +LG+ +S A SEFQ +QF+P +1 PQL 

Sbjct: 137 GLAVWFLGLDVTGSPWLLLLVALLDALLGTALGLFVSAFAASEFQAVQFMPAVIFPQLLL 196 

Query: 296 SGI I - PLENMASWAQTVGKILPLSYSGDALTKI IMY 330 
G+ P +NM + V +LP+SY+ D + +++ + 
20 Sbjct: 197 CGLFTPRDNMHPALEAVSDVLPMSYAVDGMNEVLRH 232 



There is also homologty to a DNA sequence which was identified in S.pyogenes <SEQ ID 68 1> which 
encodes the amino acid sequence <SEQ ID 682>. Analysis of this protein sequence reveals the following: 



3 9 



Possible £ 

= Seems to have no N- terminal signal sequence 



INTEGRAL Likelihood =-11 
Likelihood 
Likelihood 
INTEGRAL Likelihood 
INTEGRAL Likelihood 



Transmembrane 263 - 

Transmembrane 231 - 

Transmembrane 20 - 

Transmembrane 34 9 - 

Transmembrane 18 7 - 



245 - 284 
224 - 258 



345 - 368 
182 - 204 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



-- Cer~ainty=0 
-- Certainty=0 
■- Cer taint y=0 



5564 (Affirmative) ■ 
0000 (Not Clear) < i 
0000 (Not Clear) < i 



The protein has homology with the following sequences in the databases: 

>GP:CAB12662 GB:Z99108 similar to ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 92/369 (24%) , Positives = 180/369 (47%) , Gaps = 25/369 (6%) 

Query: 12 IKRKKTSYVTFFLMPILTTLIALSLSFSNNNQAKIGILDKDNSQISKQFIAQLKQNKKYD 71 

I +K +Y+ F P+L T + S4- N+++ ++ I+D+D++ +S+ +1 QLK + 
Sbjct: 15 IFKKPQNYLIMFAAPLLLTFVFGSMLSGNDDKVRLAIVDQDDTILSQHYIRQLKAHDDMY 74 

Query: 72 IFTKIKKEHIDHYLQDKSLEAvLTIDKGFSDK\TLQGKSQKLNIRSIANSEITEWVKAQTN 131 

+F + + L+ K + ++ I + F ++ +GK +L R VK 

Sbjct: 75 VFENMSESKASEKLKQKKIAGIIVISRSFQTQLEKGKHPELIFRHGPELSEAPMVKQYAE 134 

Query: 132 YLLENYNI IGDVALGNEDTFNR ILQKNQQLNYDVKQVTLTDRSRSKAVSST 182 

L NI A T +K++ + V + TL+D+ S T 

Sbjct: 135 SA1ATLNIQVTAAKTASQTAGENWKAAYKTVFAKJ0SEDIVPAVTRQTLSDKKEGAEASDT 194 

Query: 183 TT GFLLILMLGSTSVIYSGILADKSSQLYHRLMLSNLSRFR YMLSYVCVGFVA 235 

+ GF ++ ++ + IL + + 4-H RL+ +++SR Y+LS+ +G++ 

Sbjct: 195 ASRAAGFSILFVMLTMMGAAGTILEARKNGVWSRLLTASVSRAEIGAGYVLSFFVIGWIQ 254 

Query: 236 FTIQIVIMLSLLKVFNISFFVPTSLLLIIFFLFSLLAIGFGLLIGAITQNSQQSSQLANL 295 

F I ++LS +F 1++ P ++++++ LF L +G GL+I A + +Q NL 
Sbjct: 255 FGI - - -LLLSTHWLFGINWGNPAAVIVLVS-LFLLTWGIGLMIAANVRTPEQQLAFGNL 310 

Query: 296 IVMPTSMLAGCLWPLSITPSYMQAIGKLLPQNWVLSAIA-IFQSGGTLSQAWPYLLALMG 354 

V+ T M++G WP+ I P +MQ+I + LPQ W +S + I +G ++ +L + G 

Sbjct: 311 FVIATC^SGMYWPIDIEPKFMQSIAEFLPQKWAMSGLTEIIANGARVTD ILGICG 366 
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Query: 355 TALALISFS 363 

LA + + 
Sbjct: 367 ILLAFAAIT 375 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 92/375 (24%) , Positives = 164/375 (43%) , Gaps = 66/375 (17%) 

Query: 11 IKELF RDKRTLAMMFIAPILIMFLMS^ 66 

IK LF R K + FL PIL L+ + S ++N + KIG ++ + +S 
Sbjct: 5 IKTLFVKIKRKKTSWTFFLMPIIjTT-LLALSLSFSNNNQAKIGILDKDNSQISK 58 

Query: 67 HIQWSFKFNSSAKKALKSNKIDALISEDNKSYTVFYANTDSSKTTLT-RQAFKTAVMTM 125 

+F + LK NK + ++ K + Y S + LT + F V 

Sbjct: 59 QFIAQ LKQNKKYDIFTKIKKEHIDHYLQDKSLEAVLTIDKGFSDKVLQG 107 

Query: 126 NSKELISQVKILANKNPKLAQSLQTRSKYIKEKYNY GNKHT GF 168 

S++L I + N +++++++ Y+ E YN GN++T + 

Sbjct: 108 KSQKL NIRSIANSEITEWVKAQTNYLLENYNIIGDVALGNEDTFNRILQKNQQLOT 163 

Query: 169 FAKMIPIL MGFMVFFFVFLISGM--ALLKERTSGTLDRLLATPVKRSD 214 

K + + GF++ + S + +L +++S RL+ + + R 

: 164 DVKQVTLTDRSRSKAVSSTTTGFLLILMLGSTSVIYSGILADKSSQLYHRLMLSNLSR-- 221 



Sbjct 
Sbjct: 
Sbjct 
Sbjct 



215 IVFGYMLSY GILAIIQTIVIVLSTIWLLDIQWGSIFSVIIVNFILALVALSLGILM 271 

F YMLSY G +A IVI+LS + + +1 ++I+ F+ +L+A+ G+L+ 

222 --FRYMLSWCVGFVAFTIQIVIMLSLLKVFNISFFVPTSLLLIIFFLFSLLAIGFGLLI 279 



331 GQGLPNVSSNLLVLL 345 

G L LL L+ 

339 GGTLSQAWPYLLALM 353 



A further related DNA sequence was identified in S.pyogenes <SEQ ID 9081> which encodes the amino 
acid sequence <SEQ ID 9082>. Analysis of this protein sequence reveals the following: 



Possible site: 38 
.> Seems to have an uncleavable N-term signal seq 
INTEGRAL Likelihood =• 
INTEGRAL Likelihood =• 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 



Transmembrane 351 

Transmembrane 262 

Transmembrane 288 

Transmembrane 229 



37 ( 17 - 

367 ( 346 - 

278 ( 260 - 

304 ( 288 - 

245 ( 229 - 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm • 



-- Certainty=0. 6010 (Affirmative) • 
— Certainty=0. 0000 (Not Clear) < i 
-- Certainty=0. 0000 (Not Clear) < i 



An alignment of the GAS and GBS sequences follows: 

Score = 62.5 bits (149), Expect = 9e-12 

Identities = 72/382 (18%) , Positives = 166/382 (42%) , Gaps = 



^WLFHLIKKESLQIFRNRTALL^1MVIFPILMIVILSFAFKSSFNTATTVPKLTIRYQLEG 60 
M + + +K ++FR++ L MM + PIL++ +++ F ++ NT + + + ++ 
MRiraiTEKVIKELFIfflKRTIJUMFIA^ 60 



= 32/382 (8%) 

Sbjct: 

Query: 61 EKTDYQKNFLAFLKVLNQKLHLETKPSNSLEKDRQRVSEGALTAVLEVKKNQTIKVITNN 120 

L+ H++ + ++ + + A++ + N++ V N 

Sbjct: 61 N LDNIKHIQVRSFKFNSSAKKALKSNKIDALIS - EDNKSYTVFYAN 105 

Query: 121 INQQNADLINMLVKNYVDNAKTYDSIAALY PQQLNHIRKRSVDYVKVSSIQTSK 174 
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+ L K V+ + + 1+ + P+ ++ RS Y+K + + 
Sbjct: 106 TDSSKTTLTRQAFKTAVlTITTOSKEIiISQWILaNKNPKIiAQSLQTRS-KyiKE KYNY 161 

Query: 175 GMTSADYYA ISMFTMITFYSMMSAMNLVLSDRQQRITNRIHLTGVSPSFLVFGKL1 230 

5 G + ++A I M M+ F+ + + +L +R +R+ TV S +VFG ++ 

Sbjct: 162 GNKNTGFFAKMIPILMGFMVFFBVFLISGMALLKERTSGTLDRLLATPVKRSDIVFGYML 221 

Query: 231 GAMtATTVQLSLLYIFTRFVLRVNWGTNEWMLIGITASLVYLSVAIGIGLGISIKNEAFIi 290 
+ +Q ++ + T ++L + + + +1 + L +++4+GI + K+E + 
10 Sbjct: 222 SYGILAIIQTIVIVLSTIWLLDIQWGSIFSVIIVNFILALVALSLGILMSTLAKSEFQM 281 

Query: 291 TVASNTI I P I FAFLGGS YVPLTTLHSS I INQLSNISPI KWVNDSLFYL I FGGQYNP - 1 PV 349 

II F G +PL + +S + I P+ + D+L +1 GQ P + 
Sbjct: 282 MQFIPLIIMPQLFFSG-IIPLENM-ASWAQTVGKILPLSYSGDALTKIIMYGQGLPNVSS 339 

15 

Query: 350 TLIVNISIGTIFIILALIGMRK 371 

L+V + 11+ G++ + 
Sbjct: 340 NLLVLLLFLIILTIANIFGLKR 361 

20 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 215 

A DNA sequence (GBSx0229) was identified in S.agalactiae <SEQ ID 683> which encodes the amino acid 
sequence <SEQ ID 684>. This protein is predicted to be CG1718 gene product (b0794). Analysis of this 
protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terrainal signal sequence 

INTEGRAL Likelihood = -1.17 Transmembrane 118 - 134 ( 117 - 134) 

Final Results 

bacterial membrane Certainty=0. 1468 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



35 A related GBS nucleic acid sequence <SEQ ID 8521> which encodes amino acid sequence <SEQ ID 8522> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Di scrim Score: -10.96 
GvH: Signal Score (-7.5): -4.84 
40 Possible site: 15 

»> Seems to have no N-terminal signal sequence 

ALOM program count: 1 value: -1.17 threshold: 0.0 

INTEGRAL Likelihood = -1.17 Transmembrane 142 - 158 ( 141 - 158) 
PERIPHERAL Likelihood = 4.98 137 
45 modified ALOM score: 0.73 



Final Results 

50 bacterial membrane Certainty=0. 1468 (Affirmative) < succ: 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF50837 GB:AE003568 CG1718 gene product [Drosophila melanogaster] 
Identities = 80/204 (39%), Positives = 123/204 (60%), Gaps = 3/204 (1%) 



Query: 7 EIIGLIGPSGAGKSTLIKTMLGMEKADKGTALV- -LDTQMPDRNILNQIGYMAQSDALYE 64 
E GL+G +GAGK+T KMGE+ GAVL+ 41 IGY Q DAL + 
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Sbjct: 1394 ECFGLLGVNt^GKTTTFKMMTGDERISSGAAYVQGIiSLESNMNSIYKMIGYCPQFDALLD 1453 

Query: 65 SLTGLENLLFFGKMKGIQKTELKQQITHISKWDLENQLDKFVSGYSGGMKRRLSLAIAL 124 

LTG ELF ++G+Q++ ++Q ++K +DK YSGG KR+LS AIA+ 

Sbjct: 1454 DLTGREVLRIFCMLRGVQESRIRQLSEDLAKSFGFMKHIDKQTHAYSGGNKRKLSTAIAV 1513 

Query: 125 LGNPTVLILDEPTVGIDPSLRRKIWQELINIKDEGHS1FITTHVMDEAE-LTSKVALLLR 183 

+G+P+V+ LDEPT G+DP+ RR++W + I+D G SI +T+H M+E E L +++A+++ 
Sbjct: 1514 IGSPSVIYLDEPTTGMDPAARRQLKKMVCRIRD3GKSIVLTSHSMEECEALCTRLAIMVN 1573 

Query: 184 GNI IAFDTPLHLKKQFNVSTIEEV 207 

G + HLK +F+ I ++ 

Sbjct: 1574 GEFKCIGSTQHLKNKFSKGLILKI 1597 
Identities = 73/216 (33%) , Positives = 128/216 (58%) , Gaps = 9/216 (4%) 





1 


MEVFKGEIIGLIGPSGAGKSTLIKTKLGMEKADKGTALV--LDTQMPDRNIIJSQIGYMAQ 


58 






M +F+ EI L+G +GAGK+T I + GM GTA++ D + 


+G Q 




Sb j ct : 


536 


MNMFEDEITVLLGHNGAGKTTTISMLTGMFPPTSGTAI1NGSDIRTNIEGARMSLGICPQ 


595 




59 


SDALYESLTGLENLLFFGKMKGIQKTELKQQITHISIWVDLENQLDKFVSGYSGGMKRRL 


118 






+ L++ ++ ++ FF +MKG++ ++Q++ K+++LE++ + S 


SGGMKR+L 




Sbjct: 


596 
119 


HNVLFDEMSVSNHIRFFSRMKGLRGKAVEQEVAKYLKMIELEDKANVASSK 
SLAIALLGNPTVLILDEPTVGIDPSLRRKIWQELINIKDEGHSIFITTHVW 


LSGGMKRKL 
DEAE-LTSK 


655 
177 






S+ AL G+ V++ DEP+ G+DPS RR++W +L+ 4 G ++ +TTH M 






Sbjct: 


656 


SVCCALCGDTKWLCDEPSSGMDPSARRQLW-DLLQQEKVGRTLLLTTHFM 


DEADVLGDR 


714 




178 


VALLLRGNI IAFDTPLHLKKQFN VSTIEEVF 208 










+A++ G + T LKKQ+ VS ++ +F 






Sbjct: 


715 


IAIMCDGELKCQGTSFFLKKQYGSGYRLVSGVQNLF 750 







A related DNA sequence was identified in S. pyogenes <SEQ ID 685> which encodes the amino acid 

sequence <SEQ ID 686>. Analysis of this protein sequence reveals the following: 

Possible site: 59 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.43 Transmembrane 49 - 65 ( 49 - 65) 

Final Results 

bacterial membrane Certainty=0 .1171 (Affirmative) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB12660 GB:Z99108 similar to ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 151/316 (47%) , Positives = 202/316 (63%) , Gaps = 18/315 (5%) 

Query: 4 VQLTNWKSYKNGKKA-VNDVSLSIEAGNIYGLLGPKGAGKSTLINLILGLIPLSSGKIT 62 

+Q N+ K+Y GKK V +S S++ G +GLL3PKGAGKST I++I GL+P SG IT 
Sbjct: 2 LQAENIKKAY--GKKTIVKGISFSLKKGESFGLLGPNGAGKSTTISMISGLVPHDSGNIT 59 

Query: 63 VLGQS-QKTIRKISSQIGYVPQDIAVYPDLTAYENVELFGSLYGLKGAQLKKQVLKSLEF 121 

V G K K +IG VPQ+IA+YP LTA+EN+ +G +YGL + KK+ + LE+ 
Sbjct: 60 VGGYVIGKETAKAKQKIGIVPQEIALYPTLTAHENLMFWGKI'rYGLTHDEAKKRAAEVLEY 119 

Query: 122 VGLHSQAKQFPSQFSGGMJO^LNIACALVHSPKLIIFDEPTVGIDPQSRNHILESIRLLN 181 

VGL +AK FSGGMKRR+NI AL+H P+L+I DEPTVGIDPQSRNHILE+++ LN 

Sbjct: 120 VGLTERAKDKIETFSGGMKRRINIGAALMHKPBILLII^EPWGIDPQSRNHILETVKQLN 179 

Query: 182 KEGATVIYTTHYMEEVEALCDYIFIMDHGQVI3EGPKFELEKRYVANLANQIIVTLTDSR 241 

+ G TVIYT+HYMEEVE LCD I I+D G++I G K +L R + Q+ V+ + 
Sbjct: 180 ETGMWIYTSHYMEEVEFLCDRIGIIDCGEMIAIGTKTDLCSRLGGDTIIQLTVSGINEA 239 

— MTSWHQLTQANITFSEIRHNHL 291 



WO 02/34771 



-285- 



PCT/GB01/04789 



Sbjct: 240 FLVAIRSLAHVNDVTVHE LELKIDISAAHHEKWTSLIAEATAHHINLLSLQVQEP 295 

Query: 292 NLEEI FLHLTGKKLRD 307 

NLE +FL+LTG+ LRD 
Sbjct: 296 NLERLFLNLTGRTLRD 311 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 81/211 (38%) , Positives = 125/211 (58%) , Gaps = 2/211 (0%) 

Query: 1 MEVFKGEIIGLIGPSGAGKSTLIKTMLGMEKM)KGTALVL-DTQMPDRNILNQIGYMAQS 59 

+ + G I GL+GP+GAGKSTLI +LG+ G VL +Q R I +QIGY+ Q 

Sbjct: 25 LSIEAGNIYGLLGPNGAGKSTLINLILGBIPLSSGKITVLGQSQKTIRKISSQIGYVPQD 84 

Query: 60 DALYESLTGLENLLFFGKMKGIQKTELKQQITHISKWDLENQLDKFVSGYSGGMKRRLS 119 

A+Y LT EN+ FG + G++ +LK+Q+ + V L +Q +F S +SGGMKRRL+ 
Sbjct: 85 IAVYPDLTAYENVELFGSLYGLKGAQLKKQVLKSLEFVGLHSQAKQFPSQFSGGMKRRLN 144 

Query: 120 IAIALLGNPTVLILDEPWGIDPSLRRKIWQELINIKDEGHSIFITTHVMDEAE-LTSKV 178 

+A AL+ +P ++I DEPTVGIDP R I + + + EG ++ TTH M+E EL + 
Sbjct: 145 IACALVHSPKLIIFDEPTVGIDPQSRNHILESIRLLNKEGATVIYTTHYMEEVEALCDYI 204 

Query: 179 ALLLRGNI IAFDTPLHLKKQFNVSTIEEVFL 209 

++ G +1 L+K++ + 4+ + 

Sbjct: 205 FIMDHGQVlEEGPKFELEKRYVANIiANQIIV 235 

SEQ ID 8522 (GBS391) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 74 (lane 7; MW 30kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 83 (lane 4; MW 55kDa). 

GBS391-GST was purified as shown in Figure 217, lane 3. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 216 

A DNA sequence (GBSx0230) was identified in S.agalactiae <SEQ ID 687> which encodes the amino acid 
sequence <SEQ ID 688>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 6732 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 217 

A repeated DNA sequence (GBSx0231) was identified in S.agalactiae <SEQ ID 689> which encodes the 
amino acid sequence <SEQ ID 690>. This protein is predicted to be ISL2 protein. Analysis of this protein 
sequence reveals the following: 
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Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MKAQAIVTSQGRIVSLDIAVNYCHDMKLFKMSRIMIGQAAKIIjADSGYQGIMKMYSOAQT 60 

MK QAIVTSQGRIVSLDI VNYCHDMKLFKMSRRNIGQA KILADSGYQG+MK+Y QAQT 
Sbjct: 1 MKTOAIOTSQGRIVSLDITVNYCHDMKLFKMSRRNIGQAGKILADSGYQGLMKIYPQAQT 60 

Query: 61 PRKSSKLKPLTLEDICrYNHTLSKERIKAraNIFAKVKTFKIFSTTYRNRRKEFGLRMNLIA 120 

RKSSKLKPLT+EDK NH LSKER KVENIFAKVKTFK+FSTTYR+ RKRFGLRMNL A 
Sbjct: 61 SRKSSKLKPLTVEDKACNHALSKERSK^7ENIFAKVKTF1OTFSTTYRSHRKRFGLRMNLSA 120 

Query: 121 GMINRELGF 129 

G+IN ELGF 
Sbjct: 121 GIINHELGF 129 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 218 

A repeated DNA sequence (GBSx0232) was identified in S.agalactiae <SEQ ID 69 1> which encodes the 
amino acid sequence <SEQ ID 692>. This protein is predicted to be ISL2 protein. Analysis of this protein 
sequence reveals the following: 

r> N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3996 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAC18595 GB:AJ278419 IS1381 transposase [Streptococcus pneumoniae] 
Identities = 110/125 (88%), Positives = 119/125 (95%) 

Query: 1 ^#IYEASKQLTDTOFKRLVGVQRTTFEEMIAVLK^AYQRKHAKGGRTPKLSLEDLLMATLQ 60 

MNYEASKQLTD RFKRLVGVQRTTFEEMLAVLKTAYQ KHAKGGR PKLSLEDLLMATLQ 
Sbjct: 1 ^^EASKQLTDARFKPiVGVQRTTFSEML^.VLKTAYQLKHAKGGRKPKLSLEDLLMATLQ 60 

Query: 61 YMREYRTYEQIAADFG1HESNLIRRSQWVESTLIQSGFTISKTHLSAEDTVIVDATEVKI 120 

Y+REYRTYE+ IAADFG+HESNL+RRSQWVE TL+QSG TIS+T LS +EDTV+ +DATEVKI 
Sbjct: 61 YWEYRTYEEIAADFGVHESNLLRRSQITOEWLVQSGVTISRTPLSSEDTVMIDATEVKI 120 

Query: 121 NRPKK 125 

NRPKK 
Sbjct: 121 NRPKK 125 

No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 219 

A DNA sequence (GBSx0233) was identified in S.agalactiae <SEQ ID 693> which encodes the amino acid 
sequence <SEQ ID 694>. Analysis of this protein sequence reveals the following: 
Possible site: 57 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.40 Transmembrane 130 - 146 ( 123 - 15S) 
INTEGRAL Likelihood = -7.86 Transmembrane 169 - 185 ( 167 - 191) 
INTEGRAL Likelihood = -6.90 Transmembrane 100 - 116 ( 95 - 118) 
INTEGRAL Likelihood = -5.52 Transmembrane 199 - 215 ( 189 - 216) 



Final Results 

bacterial membrane Certainty=0 . 5161 (Affirmative) < succ 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04126 GB:AP001508 unknown conserved protein in others 
[Bacillus halodurans] 
Identities = 47/207 (22%) , Positives = 95/207 (45%) , Gaps = 14/207 (6%) 

Query: 

Sbj ct: 

Query: 66 SVFGEDRHDFINQVIKSTPKISKKEE-TLQRWDLAILLLTIQMIIFLGGYLITEALQQSV 124 

+VFG++ + +++I PK+ KE L + L++ T+ ++F G Y + V 
Sbjct: 66 AVFGDNPKQYADEI IGEIPKMVTKERFGLFAYGLSMFFATV- -LVFSGIYRMLRYYVFQV 123 

Query: 125 PDLIPITLLDVLFAIFISIIAVKIADTIIYATYNFDK SKEKKYFFRYIFLILSLII 180 

+ + + A+ +1 ++ IA ++ + + + K F +1 + +1 

Sbjct: 124 GEAVSEVYVGT- -ALITTIASIVIAWMFVFWFQYFRWSCFRTINKVFEFFILWLGGMIP 181 

Query: 181 AYILIGKYYHLP FINIPLWIYLI 203 

+ Y P I IP+++Y + 

Sbjct: 182 FALFFALLYFTPNVGRMIEIPVYLYFV 208 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 220 

A DNA sequence (GBSx0234) was identified in S.agalactiae <SEQ ID 695> which encodes the amino acid 
sequence <SEQ ID 696>. This protein is predicted to be minor extracellular protease epr precursor (epr). 
Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-10.72 Transmembrane 10 - 26 ( 5 - 33) 

Final Results 

bacterial membrane Certainty=0. 5288 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco ■ 
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A related GBS nucleic acid sequence <SEQ ID 8523> which encodes amino acid sequence <SEQ ID 8524> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 8 
McG: Discrim Score: 12.11 
5 GvH: Signal Score (-7.5): -4.02 

Possible site: 29 
>» Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -10.72 threshold: 0.0 

INTEGRAL Likelihood =-10.72 Transmembrane 8 - 24 ( 5-33) 
10 PERIPHERAL Likelihood = 13.74 219 

modified ALOM score: 2.64 



*** Reasoning Step: 3 



Pinal Results 

bacterial membrane Certainty=0 . 5288 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

!GB:Z99123 extracellular serine protease [Bacillus s... 

>GP:CAB15866 GB: 2199123 extracellular serine protease [Bacillus subtilis] 
Identities = 44/150 (29%), Positives = 80/150 (53%), Gaps = 14/150 (9%) 

Query: 37 QMDTWSSVNHVSDSQLTEAQDMLDKFEKKPSEKLLKBVELALNKLSNSSKKEALQKRFK 96 

++D V+S N + +A+D + K EK +++ + + A+NKL N + K+ LQKR 

Sbjct: 428 RLDKVQSYRN VKDAKDKVAKAEKYKTQQTVDTAQTAINKLPNGTDKKNLQKRLD 481 

Query: 97 KAKDKYLKDEy^KKATKDATDLVEILEQAPSEENVLKAEAAVNKLTVKESKEALQKRIDT 156 

+ K +Y+ A+K A D V E++ + +V A++A+ KL K +LQKR++ 

Sbjct: 482 QVK-RYI ASKQAKDKVAKAEKSKKKTDVDSAQSA1GKLPASSEKTSLQKRLNK 533 

Query: 157 VKTQYGLIGNQTPSSSVAETTEQGTANPAS 106 

VK+ Q+ S++ ++T+ A S 

Sbjct: 534 VKSTNLKTAQQSVSAAEKKSTDANAAKAQS 563 
Identities = 39/124 (31%) , Positives = 64/124 (51%) , Gaps = 2/124 (1%) 

Query: 

Sbjct: 



Query: 95 FKKATOKYLKDEADKKATKDATDLVEILEQAP 154 

KKK EA K T A V+E+ ++++ A++AVN+L K LQKR+ 

Sbjct: 581 LDKVKKKVAAAEAKKVETAKAK--VKKAEKDKTKKSKTSAQSAWQLKASNEKTKLQKRL 638 

Query: 155 DTVK 158 
+ VK 

Sbjct: 639 NAVK 642 



A related DNA sequence was identified in S.pyogenes <SEQ ID 697> which encodes the amino acid 
sequence <SEQ ID 698>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.99 Transmembrane 24 - 40 ( 23 - 43) 

Final Results 

bacterial membrane Certainty=0. 2996 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:CAB15866 GB:Z99123 extracellular serine protease [Bacillus subtilis] 
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Identities = 43/130 (33%), Positives - 71/130 (54%), Gaps = 8/130 (6%) 

Query: 41 GSHPQTQDKVA KHSKSAASLLKKAVKAVM3ADRLATAAAIQEAQKAVDKLAESSKKK 97 

G P + +K + + +K ++ LK A ++V+ A4+ +T A +AQ AV4+L K 
Sbjct: 516 GKXiPASSEKTSLQKRLNKVKSTNLKTAQQSVSAAEKKSTDAl-IAAKAQSAVNQLQAGKDKT 575 

Query: 98 TLQEQLN VAKAKQEQEDAATQAVKAAEET1MQNLKDIAQKAVNDLSNKGKKAALQ 152 

LQ++L+ VA A+ ++ + A VK AE+ + K AQ AVN L +K LQ 

Sbjct: 576 ALQKRLDKVKKCTAAAiaK^TAKAKVKKAEKIDKTKKSKTSAQSAVNQLKASNEKTKLQ 635 

Query: 153 SRLDAILPAK 162 

RL+A+ P K 
Sbjct: 636 KRLNAVKPKK 645 
Identities = 31/105 (29%), Positives = 53/105 (49%), Gaps = 1/105 (0%) 

Query: 54 SKSAASLLKKAVKAvlTOADRIATAAAIQEAQKAvBKIAESSKKKTLQEQLNVAKAKQEQE 113 

+++ S A +AV A++ I +A++ + +L S K L ++ + + 

Sbjct: 380 AQATDSAYAAAEQAVKKAEQTKAQIDINKARELISQLPNSDAKTALHKRLDKVQSYRNVK 439 

Query: 114 DAATQAVKAAEETLNQNLKD IAQKAVNDL SNKGKKAALQSRLDAI 158 

DA + KA E+ Q D AQ A+N L N K LQ RLD + 
Sbjct: 440 DAKDKVAKA-EKYKTQQTVDTAQTA1NKLPNGTDKKNLQKRLDQV 483 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 51/233 (26%) , Positives = 115/233 (49%) , Gaps = 13/233 (5%) 

Query: 2 SMKIDKKELLALIASIILLIFASVTFFLFKDHGTTQMDTVESSVNHVSDSQLTEAQDMLD 61 

SM +KE L + S++ + + +F H TQ + S + + S L +A ++ 

Sbjct: 12 SMTKSQKEALYVmLSVLTITLIGGSCLIFGSHPQTQDKVAKHSKS--AASLLKKAVKAVN 69 

Query: 62 KFEKKPSEKLLKDVELALNKLSNSSKKEALQKRFKKAKDKYLKD^ 121 

++ + +++ + A++KL+ SSKK+ LQ++ AK K +++A AT V+ 

Sbjct: 70 DADRIATAAAIQFAQKAVDKLAESSKKKTLQEQLNVAKAKQEQEDA- ATQAVKA 122 

Query: 122 LEQAPSEENVLKAFJ^WKLTVKESKFJU^KRIDTVKTQYGLIGNQTPSSSVAETTEQGT 181 

E+ ++ A+ AVN L+ K K ALQ R+D + +1 ++ P S E T+ 

Sbjct: 123 AEETLNQNLKD1AQKAVNDLSNKGKKAALQSRLDAILPAKPI I -DEFPRQS -GEITDNSY 180 

Query: 182 ANPASQDTSSYWQOTAPTYE-QPQANNTPvTPGVNIfr/P-TPGTGTVPATNG 232 



SEQ ID 8524 (GBS278) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 52 (lane 6; MW 40kDa). 

The GBS278-His fusion product was purified (Figure 206, lane 10) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 305), which confmned that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 221 

A DNA sequence (GBSx0235) was identified in S.agalactiae <SEQ ID 699> which encodes the amino acid 
sequence <SEQ ID 700>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1466 (Affirmative) < suco 

bacterial membrane Certainty=0 . 000Q (Not Clear) < succ^ 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 222 

A DNA sequence (GBSx0236) was identified in S.agalactiae <SEQ ID 701> which encodes the amino acid 
sequence <SEQ ID 702>. This protein is predicted to be N-acetylglucosamine-6-phosphate deacetylase 
10 (nagA). Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 4507 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9297> which encodes amino acid sequence <SEQ ID 9298> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

jGP.-AAG21S88 GB:AY007718 N-acetylglucosamine-6-phosphate deacetylase 
[Lactococcus lactis subsp. cremoris] 
Identities = 113/178 (63%) , Positives = 135/178 (75%) 

Query: 131 GIYFEGPYFTEEYKGAQNPIYMRNPNLEEFAQWQKAAKGLITKIALAPEREGVEEFVSAI 190 

GI+FEGP+FTEE KGAQNP YMR+ + E WQ+AA G++ KI LAPEREG E+F+ 
Sbjct: 1 GIFFEGPFFTEEKKGAQNPKYMRDAKMWELEDWQEAAHGMLKKIGLAPEREGSEDFIRKA 60 

Query: 191 TKQGVTVALGHSNGTYKEAKKAVKAGASVIW^AYNGMRGLTHREPGIWGAVYNLPNTYAE 250 

T+ GV +ALGHSN TYK+A V+AGASVWVH +NGM G+TH+EPGMVGA+ N PNTYAE 
Sbjct: 61 TESGWIALGHSNATYKQAVAGVQAGASW'TOJTFNGMEGMTHQEPGMVGAILNTPNTYAE 120 

Query: 251 LICDGHHVDPVACD1LMTQKGHNHVALITDCMAAGGAPDGDYMLGELPVWSNGTARL 308 

LICDGHHV P A +I++ KG +HV LITD MAG PDG YMLGE V V +G A L 
Sbjct: 121 L1CDGHHWPEAAEIWKMKGADHVVLITDSMRAAGLPDGPYMLGEYEVEVRDGAAWL 178 

A related DNA sequence was identified in S.pyogenes <SEQ ID 703> which encodes the amino acid 
sequence <SEQ ID 704>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3114 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 227/300 (75%) , Positives = 262/300 (86%) 

Query: 9 MTKYI KADRFFYADHVKENGYLE I KDNHFGKWIENI SGQEE I LDYSGYQI APGLVDTHIH 68 

MT Y+KAD F+Y V+ GYL + D FG+W E + + 1 +DY+GYQI APGIjVDTHT H 
Sbjct: 1 MTCYLKADCFYYPTEVRPAGYLSLHDGVFGEITOEIVPADAQIIDYTGYQIAPGLVDTHIH 60 
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Query; 69 GFAGADVMDCDSEGILRMSAGLLSTGVTSFLPTTLTSDTKHLEEP.SKSVAAVAGE^EQGAK 128 

G+AGADVMD ++GI +MS GLL+TGVTSFLPTTLTS ++LE+ S ++A+VA + +GAK 
Sbjct: 61 GYAGADVMDNSAQGIHQMSEGLIATGVTSFLPTTLTSTFEQriEKVSGTIASVADQVKGAK 120 

5 Query: 129 IQGIYFEGPYFTEEYKGAQNPIYMRNPNLEEFAQVJQKAAKGL1TKIALAPEREGVEEFVS 188 

IQGIYFEGPYFTEEYKGAQNP YM+ P LEEF WQKAAKGLI KIALAPER+GV+EFVS 
Sbjct: 121 IQGIYFEGPYFTEEYKGAQNPSYMKTPRLEEFDAWQKAAKGliIKKIAIAPERDGVKEFVS 180 

Query: 189 AITKQGVWALGHSNGTYKEAKKAVKAGASVWVHAYNGMRGLTHREPGMVGAVYNLPNTY 248 
1 0 A+TKQGVWALGHSNGTY+EAK+AV4-AGASVVA/HAYNGMRGLTHREPGMVGAVYNLPNTY 

Sbjct: 181 AVTKQGVTVALGHSNGTYQEAKEAVQAGASVWVmYNGMRGLTHREPGIWGAWmLPNTY 240 

Query: 249 AELICDGHHVDPVACDILMTQKGHNHVALITDaiAAGGAPDGDYMLGELPVWSNGTARL 308 
AELICDGHHV P+ACDILM QKGH+HVA+ITDCM AGG+PDGDY+LGE VW+NGTARL 
15 Sbjct: 241 AELI CDGHHVS P I ACD I LMQQKGHDHVAMITDCMRAGGSPDGDYLLGEFS WVANGTARL 300 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 223 

20 A DNA sequence (GBSx0237) was identified in S.agalactiae <SEQ ID 705> which encodes the amino acid 

sequence <SEQ ID 706>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
" »> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0. 3709 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 A related GBS nucleic acid sequence <SEQ ID 9307> which encodes amino acid sequence <SEQ ID 9308> 
was also identified. 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB16112 GB:Z99124 yyaQ [Bacillus subtilis] 
Identities = 40/110 (36%) , Positives = 62/110 (56%) , Gaps = 12/110 (10%) 

35 

Query: 121 IAKTFEDSVDYPFAKHPQYASYRVSG--KWYAIiLFPLKMGKLENVPAQLSED EVEVL 175 

4- + + S DYP+ K+P YAS R + KWY L+ 4- +P +L D E+++L 

Sbjct: 11 VKEKYGTS PDYPWEKYPNYASLRHTSNKKWYGL IMNV LPEKLGLDGHGEIDIL 63 

40 Query: 176 NI KVNPQDME I LLQKEGI YPS YHMSKKTWVS I VLDNTLSDIE I FKLVSDS 225 

N+K P+ + L E I P YHM K+ W+SIVL+ T + EI+ L+ S 
Sbjct: 64 NLKCPPEISDRLRNGENILPGYHMDKEHWISIVLERTDPEGEIYNLIEQS 113 

A related DNA sequence was identified in S.pyogenes <SEQ ID 707> which encodes the amino acid 
45 sequence <SEQ ID 708>. Analysis of this protein sequence reveals the following: 
Possible site: 22 

»> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 2541 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

55 Identities = 114/247 (46%), Positives = 169/247 (68%), Gaps = 1/247 (0%) 

Query: 7 MSIESDFFRKKRFIFSSLEEFGFIKSDQEYIYCQTFMDNDFKAIITISLDGKIAGKVIDS 66 
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Query: 67 ALEEEYLPLRAMJYNGSFVGEVRSAYMAILGDISDSCCKDLLFTKDQSNRLMKIAKTPE 126 

LEE+YLPlH- A + G++ G+VR+AY+ +L +S +C + F Q+NRLA+ I K + 
Sbjct: 61 DLEEDYLPLQQAAWQGTYTGQVRAAYLELIjERIjSVACFEATPFQSMQANRLAKHITKEWS 120 

Query: 127 DSVDYPFAKHPQYASYRVSGKWYALLFPLKMGKLFjOTPAQLSEDEVEVIiNIKVHPQDMEI 186 

D +DYPF KHP A+YRV GKWYA++F L KL+ +P +L EV+ +KVNP+ 

Sbjct: 121 DPMDYPFEKHPDIATYRVGGK]TOAMIFSLIADK1,DQIPERLVGQTCEVMTVK™PKAFPQ 180 

Query: 187 LLQKEGIYPSYHMSKKWSIVLDNTLSDIEIFKLVSDSRKLVSHNKKSN-SEPEFWIIP 245 

LLQ+EGIYP+YHMSKK W+SI+LD+ 4+D +++ LV+ SR+LV+ N SN + P++W+IP 
Sbjct: 181 LLQQEGIYPAYHMSKKNWISIILDDKVTDDKLWTLVTQSRQLVNPNGLSNPNGPDYWVIP 240 

Query: 246 ANPKFYD 252 

AN K+YD 
Sbjct: 241 ANLKYYD 247 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 224 

A DNA sequence (GBSx0238) was identified in S.agalactiae <SEQ ID 709> which encodes the amino acid 
sequence <SEQ ID 710>. This protein is predicted to be transposase for insertion sequence element is905. 
Analysis of this protein sequence reveals the following: 

Possible site: 61 

:>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certair±ty=0 . 1824 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9601> which encodes amino acid sequence <SEQ ID 9602> 
was also identified. 

A related GBS nucleic acid sequence <SEQ ID 9595> which encodes amino acid sequence <SEQ ID 9596> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA25167 GB:L20851 transposase [Lactococcus lactis] 
Identities = 325/391 (83%) , Positives = 365/391 (93%) 



+Y+R+FETKYG V L IPRDRNG FSPAL+P+YGRRD+HLEEMVIKLY+TGVTTREISDI 



IERMYGHHYSPAT+SNISKATQENVA+FHERSLEMJY+ 1 /L+LDGTYLPLRRGTVSKECIH 



IALG+T G KA+LGY+IAPNENHASWS LL++ + QG+QQVSLW+DGF GL+Q+I QA 



Query: 252 FPMAKQQRCLVHIGRNIASKVKRADRALILEQFKTIYRAINVEEAKQALDSFINEWKPHY 311 





12 


Sb j ct : 


1 




72 


Sbjct: 


61 


Query: 


132 


Sb j ct : 


121 




192 


Sbjct: 


181 


Query: 


252 
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+P+AKQQRCL+HT RN+ASKVKRADRA+ ILEQFKTI YRA N+E A QAL++FI EWKP Y 
Sbjct: 241 YPIAKQQRCLIHISRNIASKVKRADRAVILEQFKTIYRAENLEMAVQALENFIAEWKPKY 300 

Query: 312 KKVIETLESIENLLIFYEFPHQIWGSIYSTNLIESLMKEIKRQTKKKWFPNEESLERYL 371 
5 +KV+E+LE+ +NLL FY+FP+QIW SIYSTNLIESIUKEIKRQTKKKV+FPNEE+LERYL 

Sbjct: 301 RKAMESLENTDNLLTFYQFPYQIWHSIYSTNLIESLNKEIKRQTKKKVLFPNEEALERYL 360 

Query: 372 VTLFSDYNFKQGQRIHKGFGQCTDTLESLFD 402 
VTLF DYNFKQ QRIHKGFGQC DTLESLFD 
10 Sbjct: 361 VTLFEDYNFKQSQRIHKGFGQCADTLESLFD 391 

A related DNA sequence was identified in S.pyogenes <SEQ ID 71 1> which encodes the amino acid 

sequence <SEQ ID 712>. Analysis of this protein sequence reveals the following: 

Possible site: 15 
15 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .3054 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown helow: 

Identities = 111/128 (86%) , Positives = 122/128 (94%) 

25 Query: 12 MTQFTTELLNFLAQKQDIDEFFRSSLETAMNDLLQVELSAFLGYEPYDKAGYNTGNSRNG 71 

MTQFTTELLNFLAQKQDIDEFFRSSLE AMM3LLOVELSAFLGYEPY+K GYNTGNSRNG 
Sbjct: 1 MTQFTTELLNFLAQKQDIDEFFRSSLEIAMNDLLQV3LSAFLGYEPYEKEGYNTGNSRNG 60 

Query: 72 AYTRRFETKYGWNLLIPRDRNGEFSPALIPSYGRRDNHLEEMVIKLYRTGVTTREISDI 131 
30 Y+R+FETKYG+VNL+IPRDRNGEFSP L+PSY RR++HLEE+VIKLY+TGVTTREISDI 

Sbjct: 61 TYSRQFETKYGLVNLIIPRDRNGEFSPVLLPSYARREDHLEEIVIKLYQTGVTTREISDI 120 

Query: 132 IERMYGHH 139 
I+RMYG H 
35 Sbjct: 121 IKRMYGDH 128 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 225 

40 A DNA sequence (GBSx0239) was identified in S.agalactiae <SEQ ID 713> which encodes the amino acid 
sequence <SEQ ID 714>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-12.42 Transmembrane 268 - 284 ( 260 - 285) 
45 INTEGRAL Likelihood = -6.32 Transmembrane 232 - 248 ( 231 - 254) 

Final Results 

bacterial membrane Certainty=0 .5967 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD40365 GB:AF036485 hypothetical protein [Plasmid pNZ4000] 
Identities = 69/283 (24%) , Positives = 133/283 (46%) , Gaps = 9/283 (3%) 

55 

Query: 11 INVDDLSLQEERF- LPSELLAYARDENESS- FVRDIEGHLALVYQLLDTQGHVDDVRHVP 68 

IN ++ + E+++ + +++ Y D +ES+ +V DI L L D +R++ 

Sbjct: 19 INAEERATLEDQYGIDEDIIEYVTDNDESTNYVYDINEDDQLFIFLAPYALDKDALRYIT 78 
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Query: 69 RVI PVTLFLKEDGLFVLANHKNIl^VKKAIiNRV EKVDSPKHLLLSL VTAFSKQYFDV 125 

+ P+ L+LFNIVAL +VS +L+ + + 

Sbjct: 79 Q--PFGMLLHKGVLFTF-NQSGIPEVMTALYSPJjDCTEVKSVDAFILETLFTVWSFIPI 135 

Query: 126 LDTISEEOTKIiIiroLRKRPNKSNLARLMLQSGTVHLMMGTKQNFEMLTDLQNIEQDKEN 185 

I+++R+ L L ++ S+L L+ LQ L + N L Ii 

Sbjct: 136 SRRITKKROTLDKMLNRKTKNSDLVSLSYLQQTLTFLS8AVQTNLSELDRLPKTHFGVGA 195 

Query: 186 TRNEKMQLQDAIIEARQLSWCSraSQVFQELS-SYNlWLSIWLITDl'rVTTLTIISIGISI 244 

+++ +D IE Q+ M + +QV + + N++ +NNLND + LTI S+ +++ 
Sbjct: 196 DQDKIDLFEDVQ1EGEQVQRMFEIETQVVDRIDHTMSLAM1TOLNDTMKFLTIWSLTMAV 255 

Query: 245 IAMVTSFYGMNVKLPFDSVDAVWVLIILITTIITIMLSIVMYI 287 

+++ FYGMNVKLP + W+L + 1+ ++ + + I++ + 
Sbjct: 256 PTI I SGFYGMNVKLPLAGMQYAWMLTLGI SWLI VAMLIMLKV 298 

SEQ ID 714 (GBS422) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 1 72 (lane 7; MW 60kDa). 

GBS422-GST was purified as shown in Figure 219, lane 12. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 226 

A DNA sequence (GBSx0240) was identified in S.agalactiae <SEQ ID 717> which encodes the amino acid 
sequence <SEQ ID 71 8>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0. 0783 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB61731 GB:AL133220 putative oxidoreductase . [Streptomyces 
coelicolor A3 (2) ] 

Identities = 100/306 (32%) , Positives = 152/306 (48%) , Gaps = 3/306 (0%) 

Query: 3 KVRYGWSTAKVAPRFIEGVRIAGNGEWAVSSRTLESAQAFANKYHLPKAYDKLEDMLA 62 

KVR+G+++T +A RF + + EWAV+SRT SA+ FA ++ +P+AY E + 
Sbjct: 8 KVRWGILATGGMAARFTADLVDLPDAEWAVASRTEASAKTFAERFGIPRAYGGWETLAR 67 

Query: 63 DESIDVIYVATINQDHYKVAKAALLAGKHVIiVEKPFTLTYDQANELFAIAESCNLFLMEA 122 

DE +DV+YVAT + H A L AG++VL EKPFTL +A EL ALA +FLMEA 
Sbjct: 68 DEDVDWYVATPHSAHRTAAGLCIjEAGFJTVLCEKPFTIiMAREAAEL 127 

Query: 123 QKSVFIPMTQVIKiCLLASGEIGEVISISSTTAYPN-IDHvTWFRELELGGGTVHFMAPYA 181 

P+ + +K+L+A G IGEV S+ + R+ GGG + + Y 

Sbjct: 128 MWMYCNPLVRRLKELVADGAIGETOSLQADFGLAGPFPAAHRLRDPAQGGGALLDLGVYP 187 

Query: 182 LSYLQYLFDATITHASGTATFPKGQSDSQSKLLLQLSNGVLVDIFLTTRLNLPHEMIIYG 241 

+S+ Q L T + A + D Q+ LL N L I + P+ I G 

Sbjct: 188 VSFAQLLLGEP-TDVAARAVLSEEGVDLQTGALLSYGNDALASIHCSITGGTPNSASITG 246 



Sbjct: 247 SEGRIDVPNGFFFPDHFVLHRTGRDPQEFRADPADGPRESLRHEAEEVMRALRAGETESP 306 
Query: 301 QLTLSG 306 
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Sbjct: 307 LVPLDG 312 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 227 

A DNA sequence (GBSx0241) was identified in S.agalactiae <SEQ ID 721 > which encodes the amino acid 
sequence <SEQ ID 722>. This protein is predicted to be valyl-tRNA synthetase (valS). Analysis of this 
protein sequence reveals the following: 

Possible site: 36 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.00 Transmembrane 794 - 810 ( 794 - 810) 

Final Results 

bacterial membrane Certainty=0 . 1001 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA57558 GB:L08854 valyl-tRNA synthetase [Lactobacillus casei] 
Identities = 543/881 (61%), Positives = 679/881 (76%), Gaps = 12/881 (1%) 

Query: 5 LSPKYNPAEVEEGRYQTWLDQDVFKPSGDTEAKPYSIVIPPPNVTGKLHLGHAVTOTTLQD 64 

L+PKY+ VEEGRYQ WLD+DVFKPSGD +AKPYSIVIPPPNVTGKLH+GHAWDTTLQD 
Sbjct: 27 LAPKYDHKAVEEGRYQEWLDEDVFKPSGDK3CAKPYSIVIPPPMVTGKLHMGHAWDTTLQD 86 

Query: 65 IIIRQKRMQGFDTLWLPGMDHAGIATC^UOTEERLREQGISRYDLGREKFLDKVI'IEWKDEY 124 
I+IRQKR++GFDTLWLPGMDHAGIATQAKVE +LR++GISRYDLGREKF+ KVWEWKDE+ 

Query: 125 AATIKSQWGKMGLSVDYSRERFTLDEGLSKAWKVFVDLYNKGWIYRGEFIINWDPAART 184 

A TI QW KMGLS+DYSRERFTLD+GL++AVR+VFVDLYN+G IYRGE+I+NWDP ART 
Sbjct: 147 AKTIHGQWAKMGLSLDYSRERFTLDKGUJQAVTmVFVDLYNQGLIYRGEYIVHWDPQART 206 

Query: 185 ALSDIEVIHKDVEGAFYHMNYMLEDGSRALEVATTRPETMFGDVAVAVNPEDARYKDLIG 244 

ALSDIEVIHKD +GAFYH+ Y DGS +E+ATTRPETM GD AVAV+P D RYKD++G 
Sbjct: 207 ALSDIEVIHKDDKGAFYHVKYPFADGSGYIEIATTRPETMMGDTAVAVHPGDERYKDMVG 266 

Query: 245 QNVILPIINKPIPIVADEHADPEFGTGWKITPAHDPOTJFAVGQRHNLPQVNVMNDDGTM 304 

+ ILP+ N+ IPI+ D + DPEFGTG VKITPAHDPNDF VG RH+L ++N MNDDGTM 
Sbjct: 267 TELILPLANRKIPIIEDAYVDPEFGTGAVKITPAHDPKDFQVGNRHDLKRINTMNDDGTM 326 

Query: 305 NEIiADEFNGMDRFFJVRKAVV7AKLESLGNLVKIKKTTHSVGHSERTGVVVEPRIjSTQWFVK 364 

NE A ++ GMDRFEARKA+VA L+ G L+K++ HSVGHSERTGV VE RLSTQWFVK 
Sbjct: 327 NENAGKYCGMDRFEARKAIWADLDKAGLLLKVEPIVHSVGHSERTGVQ\rE^LSTQWFVK 386 

Query: 365 MDQLAKNAI -ANQDTEDKVEFYPPRFNDTFMSWMENVHDWVI SRQLWWGHQI PAWYN- VN 422 

M LA+ AI A Q+ + KV F P RF T++ WMEN+HDWVI SRQLWWGHQI PAWYN 
Sbjct: 387 MKPLAEAAIKAQQEPDKKVTFVPERFEHTYLQV^IENIHDWVISRQLWWGHQIPAWYNKQT 446 

Query: 423 GEMYVGEDAPEG-DGWTQDEDVLDTWFSSALWPFSTMGWPDTEAADFKRYFPTSTLVTGY 481 

GE YVG +AP+ + W QD DVLDTWFSSALWPFSTMGWP+T+A D+KRY+PT TLVTGY 
Sbjct: 447 GETYVGMEAPKDIENWKQDPDVLDTWFSSALWPFSTMGWPNTDAPDYKRYYPTDTLVTGY 506 

Query: 482 DIIFFWVSRMIFQSLEFTGRQPFSNVLIHGLIRDEEGRKMSKSLGNGIDPMDVIEKYGAD 541 

DII FWV+RMIFQ L FT ++PF IiIHGL+RDE+GRKMSKSLGNGIDPMDVIEKYGIAD 
Sbjct: 507 DIIPFWVARMIFQGLHFTHQRPFQYTLIHGLMRDEQGRKMSI<BLGNGIDPMDVIEKYGAD 566 

Query: 542 ALRWFLSNGSAPGQDTOFSYEKtmASWNFINKITOISRYILMNNEGLTLDQARENVEKVV 601 

ALRWFL G+ PGQD RFSY++++A+WNFINKIWNISR+++MN L Q + 
Sbjct: 567 ALRWFLITGNKPGQDTRFSYKQX'EAAI'JNFIKKIVJNISRFVMMNLGDLDTPQQPD 620 
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Query: 602 NSQVGNVTDRWI LHNIiNETVGKVTENFDKFE FGVAGH I LYNFIWEE FANWYVELTKEVLY 661 

+++D+W+ LNET+ +V + +FEFG G LYNF W A+WYVE++KEVLY 
Sbjct: 621 -PSTFDLSDKWLFAQIJffiTIKQVMDLSAHFEFGEMGRTLyNFTWNVLADWYVEMSKEVLy 679 

Query: 662 SDNEDEKVITRS VLLYTLDQI LRLLHPIMPFVTEE IF-- GQYAEGS I VLAS YPQVNATFE 719 

D+E K R L Y LDQILRLLHP+MPFV +++ + SIV ASYP N FE 
Sbjct: 680 GDDEQAKAAKRWLAYALDQILRLLHPVMPFVT5GKLWLALPHTGKSIVTASYPVANTAFE 739 

Query: 720 NQTAHKGVESLKI)LIRSVPJ>ISRAEVNVAPSKPITILVKTSDSELESFFKDNSNYIKRFTN 779 

N A ++++ LIR VR R E + ILVK +D L+ F+ N ++I RF N 

Sbjct: 740 NADATSAMDAIIALIRGWGIRKEAGAPLKTKVDILVKLTDPALKPIFEQNFDFIDRFW 799 

Query: 780 PETLEISSAIATPELAMSSVITGAEIFLPIADLLNVEEEIARLEKELAKWQKELDMVGKK 839 

+ + + +A P++A S+VITGA IF+PL +L++++EE A+L K+ K ++E+ + KK 
Sbjct: 800 SKAFTOGTDVAEPKMAGSAVITGATIFVPLNELIDLDEEKAKLTKDAKKLEQEIARIDKK 859 

Query: 840 LSNERFVANAKPEWQKEKDKQTDYQTKYDATIARIEEMKK 880 

L+N+ F++ A W +++ K++D++ + +T R+E++++ 
Sbjct: 860 LNNQGFLSKAPEAWAEQRTKRSDFEDQLTSTKQRLEQLQR 900 

A related DNA sequence was identified in S.pyogenes <SEQ ID 723> which encodes the amino acid 
sequence <SEQ ID 724>. Analysis of this protein sequence reveals the following: 



3 N-terminal signal sequence 



■ Final Results 

bacterial cytoplasm - 
bacterial membrane - 
bacterial outside - 



• Certainty=0. 5062 (Affirmative) • 

• Certainty=0. 0000 (Not Clear) < i 

• Certainty=0 . 0000 (Not Clear) < i 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 782/878 (89%) , Positives = 818/878 (93%) 



Sbjc 

Query: 

Sbjct 

Query: 

Sbjct: 

Query: 

Sbjct 

Query: 

Sbjct 

Sbjct 

Sbjct 

Sbjct 



64 DIIIRQKRMQGFDTLWLPGMDHAGIATQAKVEERLREQGISRYDLGREKFLDKVWEWKDE 123 



244 GQNVILPIINKPIPIVADEHADPEFGTGWKITPAHDPNDFAVGQRHNLPQVNVMNDDGT 3 

G+NVILPI+NK IPIV DEHADPEFGTGWKITPAHDPNDF VGQRHNLPQVNVMNDDGT 
243 GKWILPIWKDIPIVGDEHADPEFGTGWKITPAHDPNDFEVGQRHNLPQVNVMNDDGT 3 

304 MNELADEFNGMDRFEARKAWAKLESLGNLWIKKTTHSVGHSERTGVVVEPRLSTQWFV 3 

MNELA +F GMDRFEAR+A VAKLE IjG LV I+K HSVGHSER+G WEPRLSTQWFV 
303 MNEIAGDFAGMDRFEARQATVAKLEELGALWIEKRVHSVGHSERSGAVVEPRLSTQWFV 3 

364 KI^QIiAKNAIANQDTEDKVEFYPPRFNDTFMSWMENVHDVWISRQUTOGHQIEAWXNVNG 4 

KMD+LAK A+ NQ+T+D+V+FYPPRFNDTF+ WMENVHDWVI SRQLWWGHQI PAWYN G 
363 ramEI^QAMDNQETDDRVDFYPPRFtJDTFLQVMENVHDOTISRQLWWGHQIPAWYNAEG 4 

424 EMYVGEDAPEGDGWTQDEDVLDTWFSSALViTFSTMGWPDTEAADFKRYFPTSTLVTGYDI 4 

E+YVGE+APEGD WTQDEDVLDTWFSSALWPFSTMGWPDT+ DFKRYFPTSTLVTGYDI 
423 EIYVGEEAPEGDDWTQDED^/LDTWFSSALWPFSTMGWPDTDVEDFKRYFPTSTLVTGYDI 4 



WO 02/34771 



-297- 



PCT/GB01/04789 



Query: 484 IFFWVSP^IFQSLEFTGRQPFSNVLIHGLIRDEEGRKMSKSLGNGIDPMDVIEKYGADAL 543 

IFFWVSRMIFQSLEFTGRQPF NVLIH3LIFJDEEGRKMSK3L3KGIDPMDVIEKYGAD+L 
Sbjct: 483 I FFWVSRMI FQSLEFTGRQPFQNVLIHGLIRDEEGRKMSKSLGNGIDPMDVIEKYGADSL 542 

Query: 544 RWFLSNGSAPGQDTOFSTOKMDASVTOFINKimiSRYIIiMNNEGLTLDQARENVEKVVNS 603 

RWFLSNGSAPGQDVRFSYEKMDASWNFINKIWWISRYILMNNEGLTL+ A NV KV S 
Sbjct: 543 RWFLSNGSAPGQDVRFSYEKMDASWNFINKIWNISRYILMNNEGLTLEDAESNVAKVAAS 602 

Query: 604 QVGIOTDRWILHNIJ^TOGKOTENFDKFEFGVAGHILYNFIWEEFAWyVELTKEVLYSD 663 

+ GNVTD+WILHNLNET+ KVTENFDKFEFGVAGHILYNFIWEEFANVnfVELTKEVLYSD 
Sbjct: 603 E1AGNOTDQWILHNLNETIAKVTENFDKFEFGVAGHILYNFIWEEFANWYVELTKEVLYSD 662 

Query: 664 NEDEKVITRSVLLYTLDQILRLLHPIMPFVTEEIFGQYAEGSIVLASYPQVNATFENQTA 723 

NE EKVITRSVLLYTLD+ILRLLHPIMPFVTEEI+ QYA+GSIV YP V FEN+ A 
Sbjct: 663 NEAEKVITRSVLLYTLDKILRLLHPIMPFVTEEIYAQYAQGSIVTYDYPWRPAFENEAA 722 

Query: 724 HKGVESLKDLIRSVRNSRAEVNVAPSKPITILVKTSDSELESFFKDNSNYIKRFTNPETL 783 

HKGVESLKDLIR+VRN+RAEVNVAPSKPITILVKT+DSELE FF N NYIK FTNPE L 
Sbjct: 723 HKGVESLKDLIRAVRNARAEVNVAPSKPITILVKTADSSLEDFFNSNINYIKCFTNPEKL 782 

Query: 784 EISSAIATPEIAMSSVITGAEIFLPtADLL!^EEIARLEKEIAKWQKELD^WGKKLSNE 843 

EISSAIA PEIAM+S+ITGAEI+LPLADL1KVEEELARL+KELAKWQKELDMVGKKL NE 
Sbjct: 783 E I SSAIAAPEL7AMTS 1 1 TGAE I YLPLADLIJS{VEEELAR]jDKELAKWQKEIiDMVGKKLGNE 842 

Query: 844 RFVANAKPEWQKEKDKQTDYQTKYDATIARIEEMKKL 881 

RFVANAKPEWQKEKDKQ DYQ KYDAT RI EMKK+ 
Sbjct: 843 RFVANAKPEWQKEKDKQADYQAKYDATQERIAEMKKI 880 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 228 

A DNA sequence (GBSx0242) was identified in S.agalactiae <SEQ ID 725> which encodes the amino acid 
sequence <SEQ ID 726>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 



Final Results 

bacterial cytoplasm — certainty=0. 06 69 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 727> which encodes the amino acid 
sequence <SEQ ID 728>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below: 

Identities = 148/191 (77%) , Positives = 165/191 (85%) 
Query: 14 GEKKKMNIIIIGAQASGKMTIGQEIAKQTGMTLFHNHDSIDFVLRFMPWSPDSIALTESI 73 



PCT/GB01/04789 









Sbj ct : 


3 


GAETKMNLIIIGftQASSKMTIGQEVARQ^GMTLFHKT-IDSIDFVLRPMPWSQESTALIERI 62 


Query. 


74 








rf^fetfaktgq+miftivid™ 


Sbjct: 


63 


RPAFFETFAKTGQDMIFTIVIDFNDPNDVAMLEKIQAVFQSYDQEVLFVELKTDIEERLK 122 


Query: 


134 


RNRTENFiKHKPSKEDIKWSESDICSTMDYAIFNPEVAPEALTYYHKINNTCLTATETAY 193 






RNRTENRLKHKP KR+I+WSE DI STM YA+FNPE P+ LT+Y KINNT LTA ETA 


Sbjct: 


123 


RMRTENRLI<HKPLIOOTIEWSEQDIQST^YAVFNPEEPPKTLTHYQKINNTQLTAAETAQ 182 




194 


LIIQKINQIKE 204 






LIIQK+ IKE 


Sbj ct : 




IiIIQKMTHIKE 193 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 229 

A DNA sequence (GBSx0243) was identified in S.agalactiae <SEQ ID 729> which encodes the amino acid 
20 sequence <SEQ ID 730>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>» Seems to have no N- terminal signal sequence 



Final Results 

25 bacterial cytoplasm Certainty=0 .3614 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

30 >GP:BAB04556 GB:AP001510 unknown conserved protein [Bacillus halodurans] 

Identities = 60/189 (31%) , Positives = 102/189 (53%) , Gaps = 3/189 (1%) 



Query: 7 EIVDNQLPVVETNRLLLRQRKLEDAKEIFEFVKLDEVSYPAGFPAVKSLEEEITYIQEIY 66 
E + LP +ET RL LR+ +DA I+++ ++V+ + +S+++ ++ + 
35 Sbjct: 4 EDIYGDLPTLETERLRLRKFYKDDAAAIYDYASNEQVTKYVLWETHQSIKDSEAFLA--F 61 



Query: 


67 


PTNLEKEKLPSGYAITIiKGDDKVIGS VDFNH - RHEDDIFE IG YLLHPDYWGQG I VPEAAS 125 






N EK S +AI LK ++++IG+VDF + +D E+GY+L YWGQGI+ EA + 


Sbjct: 


62 


ALNKYDEKDVSPWAIELKRNERMIGTvDFvVTOKPKDKTAELGYVL^ 121 


Query: 


126 


ALVEIGFTLLGLHKIELGCYDYNKQSQAVARKLGFTLEANIRDRRDAQGKRCGDMRFGLL 185 






ALVE GF 4 L +1+ C+ N S V K G E R +G + ++ 


Sbjct: 


122 


ALvEFGFM#lELERIQAKCFAEMISSARVMEI»GLIYEGTHRRAIYVKGAHRDFKVYAII 181 




186 


RSEWEKKRR 194 






R ++E+K + 


Sbjct: 


182 


REDYEQKHQ 190 



A related DNA sequence was identified in S.pyogenes <SEQ ID 73 1> which encodes the amino acid 
50 sequence <SEQ ID 732>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

»> Seems to have no N-terminal signal sequence 



Final Results 

55 bacterial cytoplasm Certainty=0 . 1864 (Affirmative) < succ; 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 



